Articles in this section

What metrics can be used to determine a correct value for xdr read threads

Detail

This article applies to XDR prior to server version 5.0. When shipping data to a remote data centre via XDR the architecture is defined below.

The digest log stores the digests of records incoming to the node. Digest log reader threads read the digests from the digest log and put the digest on to in-memory read request queues. Threads known as xdr-read-threads pick the digests from these in-memory request queues, process them through the de-duplication cache, schedule the read for the associated record via the service threads (or transaction queues/threads for versions prior to 4.7), potentially applies compression and, finally, pass them to an embedded Aerospike C client which ships the record to the destination cluster(s).

A build up in xdr_timelag may be observed due to slow performance of the tasks assigned to the xdr-read-threads. In that instance, increasing the number of xdr-read-threads may be an appropriate solution. What Aerospike metrics exist to determine whether an increase in xdr-read-threads could alleviate the xdr_timelag?


Answer

The following metrics are used to track the behaviour of xdr-read-threads. These are fully documented in the Aerospike Metrics Reference.

  • xdr_read_active_avg_pct. This describes the amount of time the xdr read threads spend working as opposed to waiting for digests to appear on the queues they service. High percentages for this metric along with a higher CPU usage may indicate a need to increase the number of xdr-read-threads. When the CPU is at lower utlisation the expectation is that the default number of xdr-read-threads should be sufficient to handle the XDR load.
  • xdr_read_reqq_used_pct. This gives a value in terms of percent for how full the read request queues are. This metric should be used with care. A slow disk will cause this metric to be high and so it is not a good indicator of a need to increase xdr-read-threads.
  • There is a maximum of 10,000 transactions that can be in flight in the internal XDR transaction queue. This is a hard limit and as such, if there are 10000 transactions or near in flight, increasing the number of xdr-read-threads will not solve a source side lag issue. This is measured in raw numbers by xdr_read_txnq_used or as a percentage xdr_read_txnq_used_pct.

 


Notes

Dynamically decreasing xdr-read-threads has been known to cause node crashes in some rare situations, it is therefore advisable to decrease this configuration parameter statically (restart required).

Applies To Earliest Version

Pre 4.9

Applies To Latest Version

Pre 4.9
Was this article helpful?
0 out of 0 found this helpful