Articles in this section

Batch queue full error

Problem Description

The following warning is seen in the Aerospike server logs:

WARNING (batch): (batch.c::755) Failed to find active batch queue that is not full

Explanation

This error will be displayed when the batch-max-buffer-per-queue is exceeded for all batch-index-threads on the node.

Solution

Some parameters can be tuned to accommodate batch transactions, but those should always be changed carefully, while measuring the impact on the performance for the rest of the system.

batch-index-threads can be increased. For example:

asadm -e 'asinfo -v "set-config:context=service;batch-index-threads=8"'

This parameter is set by default to the number of CPU cores available. Increasing it will increase memory usage. For example:

When batch-index-threads was 4:

18055 root      20   0 3581m 122m 3912 S  2.0  0.4   0:03.79 asd

After batch-index-threads was set to 8:

18055 root      20   0 3625m 128m 3928 S  0.0  0.4   0:06.08 asd

batch-max-buffers-per-queue can also be increased:

asadm -e 'asinfo -v "set-config:context=service;batch-max-buffers-per-queue=1024"'

Run the following command to verify the changes:

asadm -e "show config like batch"

e.g.

~~~~~~~~~~~~~~~~~~~~~~~~~~Service Configuration~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                       :   192.168.100.192:3000   192.168.100.207:3000
batch-index-threads        :   8                      8
batch-max-buffers-per-queue:   1024                   1024
batch-max-requests         :   5000                   5000
batch-max-unused-buffers   :   256                    256
batch-priority             :   200                    200
batch-threads              :   4                      4
query-batch-size           :   100                    100

Finally, update the aerospike.conf for those changes to be permanent and not revert the next time the service is restarted.


Notes

  • The batch-max-unused-buffers controls when unused 128 KiB buffers will be garbage collected and by default is 256. This should be tuned to a level where normal load does not constantly trigger garbage collection.

  • A slow processing client or a long-running batch will not slow down the batch transactions that would be queued up on the batch-index thread that would be impacted. The statistic batch_index_delay would be incremented everytime such slow batch transaction is encountered, and warning message would be logged when the delay is above the allowed threshold (either twice the client total timeout or 30 seconds if the timeout is not set on the client). For older versions, the batch socket send timeout was hard-coded to 10 seconds, which meant that there could be a slow client or a huge batch bottle-necking an entire batch-index thread.

  • Configuration parameters related to batch:

  • Statistics related to batch:


Applies To Earliest Version

Pre 4.9

Applies To Latest Version

Current Version
Was this article helpful?
0 out of 0 found this helpful