Articles in this section

XDR can ship older version of some records when a node restarts

Problem Description

In some older versions of Aerospike, it is possible that some records on an XDR destination cluster may have their value reverted to an older version after nodes at the source cluster are restarted.


Explanation

When a node with XDR enabled is restarted, it will always resume and re-process the last 5 minutes in its digest log. Log messages similar to the following will be observed:

 

Aug 14 2019 13:17:28 GMT: INFO (xdr): (xdr.c:837) Starting XDR with resume ... to ship 12 outstanding log records

...

...

...

Aug 14 2019 13:17:29 GMT: INFO (as): (as.c:445) service ready: soon there will be cake!

Aug 14 2019 13:17:30 GMT: INFO (xdr): (xdr_serverside.c:153) XDR last ship time of this node for DC 0 went back to 1565788311404 from 1565788649324

Aug 14 2019 13:17:30 GMT: INFO (xdr): (xdr_handlers.c:190) replication service ready: and now you have icing!

 

It is therefore possible for records that were updated while the restarted node was not in the cluster to have a previous version shipped if the restarted node re-processes digests (in the digestlog) of such records prior to migrations completing.

The number of affected records could be much higher if there is lag when the node is restarted.


Solution

There are 2 potential approaches to workaround this behavior:

  1. Stop the Aerospike process on the node, wait for the failed node processing to finish on the other nodes in the cluster, delete the digest log, and, finally restart the Aerospike process.
  2. Set xdr-shipping-enabled  to false in the config file on the node which is being restarted, and then dynamically set it to true once migrations have completed.

Notes

This issue is fixed in the following versions of Aerospike: 4.7.0.2 onwards 4.6.0.4 4.5.3.6 4.5.2.6 4.5.1.11 4.5.0.15

Applies To Earliest Version

Pre 4.9

Applies To Latest Version

Pre 4.9
Was this article helpful?
0 out of 0 found this helpful