Context
When the XTQ (XDR Transaction Queue) gets full, the queue will be dropped and XDR will go into recoveries. Recoveries consist of scanning the full primary index comparing LST (last ship time) and record's LUTs (Last Update Time) in order to ship instead of using the XTQ.
Method
Recoveries have been optimized in version 6.1. (improvement Jira AER-6565: (XDR) Distribute recovery/rewind requests across service threads using partition affinity, avoiding lock contention and significantly improving performance.)
Prior to this optimization the following setting could be temporarily tuned to speed up XDR out of recoveries:
1) The transaction-queue-limit can be increased to allow for the queue to hold more entry prior to clearing up and going into recoveries.
The queue size can be increased dynamically.:
The max value is 1048576. Please remember to check memory usage prior to the change and to monitor memory usage after this change.
asinfo -v 'set-config:context=xdr;dc=<DCNAME>;namespace=<NAMESPACE>;transaction-queue-limit=1048576'2) You can also temporarily control the number of partitions being handled concurrently in recovery at a time by setting the max-recoveries-interleaved setting.
asinfo -v "set-config:context=xdr;dc=<DCNAME>;max-recoveries-interleaved=10"3) You can set the number of max-used-service-threads (default:0 to use all service-threads) to process the XDR queue. Temporarily reducing that number to single positive digits should help with recoveries in this XDR version. This number should be reverted to a number equal or less than the service-threads
asinfo -v "set-config:context=xdr;dc=<DCNAME>;max-used-service-threads=5"
These changes should be monitored and reverted to previous values or more tuned values when recoveries have completed.
Notes
Workaround is not needed in version 6.1 and above
Applies To Earliest Version
5.0
Applies To Latest Version
6.0