Detail
During an XDR rewind, XDR lag can report a really high value, often years. Why does this occur?
Answer
Aerospike has an internally tracked value called Last Ship Time (LST), which is is set to the Last Update Time (LUT) of the last record that was successfully shipped via XDR. As the LUT represents when a record was written to the XDR source cluster, we can use the difference between LST and the current time to work out how far behind XDR currently is. Aerospike calculates this and presents it to the user, and we call this Lag. This means in normal operation, Lag is a good representation of whether XDR is keeping up with incoming writes
When you perform a rewind, the LST is artificially changed to be the time you are rewinding to, or to Citrusleaf Epoch (2010-01-01 00:00:00 UTC) if you perform a rewind=all. At this point XDR will enter recovery mode and use this newly set LST to find all records newer than the artificial LST and ship them. During recovery mode, the LST will not advance, and it is only once recovery mode completes that the LST will advance and jump back to the time at which it entered recovery mode
What the above means is that during a rewind, Lag will look artificially high, and will not advance until the rewind is complete. This is nothing to worry about and is expected.
It is recommended that you monitor XDR throughput and use this to make sure XDR is shipping records as expected