Articles in this section

How to resolve timeouts or XDR destination retries after cold restarting the entire cluster

Context

In a strong consistency enabled namespace, when any node in a cluster is cold started, we would expect to see all the records in that namespace will go into un-replicated state until the appeal process completes. Un-replicated records can cause a temporary increase in read/write latency as each read transaction has to wait for a write transaction to re-replicate the record.

Method

To avoid client timeouts, we can temporarily tune transaction-max-ms by increasing it to a higher value. The default is 1000ms, which may not be enough time for XDR and re-replication to take place. To adjust transaction-max-ms, you can run a command similar to the one below in asadm:

asinfo -v "set-config:context=service;transaction-max-ms=5000"

Once the timeouts reduce, you should change transaction-max-ms back to your original value.

But to fix these un-replicated records, we can refer to this
article .
 

 


Notes

If all the nodes in the cluster are cold-started, everything on the cluster will go into un-replicated state. There will be no nodes to appeal to and we may need to manually fix the un-replicated records by following the article given.

Applies To Earliest Version

5.0

Applies To Latest Version

Current Version
Was this article helpful?
0 out of 0 found this helpful