Problem Description
The following warning is seen in the Aerospike server logs:
WARNING (batch): (batch.c::755) Failed to find active batch queue that is not full
Explanation
This error will be displayed when thebatch-max-buffer-per-queue is exceeded for all batch-index-threads on the node.Solution
Some parameters can be tuned to accommodate batch transactions, but those should always be changed carefully, while measuring the impact on the performance for the rest of the system.
batch-index-threads can be increased. For example:
asadm -e 'asinfo -v "set-config:context=service;batch-index-threads=8"'
This parameter is set by default to the number of CPU cores available. Increasing it will increase memory usage. For example:
When batch-index-threads was 4:
18055 root 20 0 3581m 122m 3912 S 2.0 0.4 0:03.79 asd
After batch-index-threads was set to 8:
18055 root 20 0 3625m 128m 3928 S 0.0 0.4 0:06.08 asd
batch-max-buffers-per-queue can also be increased:
asadm -e 'asinfo -v "set-config:context=service;batch-max-buffers-per-queue=1024"'
Run the following command to verify the changes:
asadm -e "show config like batch"
e.g.
~~~~~~~~~~~~~~~~~~~~~~~~~~Service Configuration~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE : 192.168.100.192:3000 192.168.100.207:3000
batch-index-threads : 8 8
batch-max-buffers-per-queue: 1024 1024
batch-max-requests : 5000 5000
batch-max-unused-buffers : 256 256
batch-priority : 200 200
batch-threads : 4 4
query-batch-size : 100 100
Finally, update the aerospike.conf for those changes to be permanent and not revert the next time the service is restarted.
Notes
-
The
batch-max-unused-bufferscontrols when unused 128 KiB buffers will be garbage collected and by default is 256. This should be tuned to a level where normal load does not constantly trigger garbage collection. -
A slow processing client or a long-running batch will not slow down the batch transactions that would be queued up on the batch-index thread that would be impacted. The statistic
batch_index_delaywould be incremented everytime such slow batch transaction is encountered, and warning message would be logged when the delay is above the allowed threshold (either twice the client total timeout or 30 seconds if the timeout is not set on the client). For older versions, the batch socket send timeout was hard-coded to 10 seconds, which meant that there could be a slow client or a huge batch bottle-necking an entirebatch-index thread. -
Configuration parameters related to batch:
-
Statistics related to batch: