Context
When an Aerospike namespace is running in Strong Consistency mode, it ensures that all writes are acknowledged as successful to the client are replicated between all copies of a record. Sometimes a write may fail in-flight, perhaps due to a timeout or other temporal factor. In these instances a record may not be replicated across all copies. To ensure that reads are consistent, when a record is read, the replication status is checked. If the record is not shown as replicated then a write (called a re-replication) is initiated within the cluster to replicate the record to the other copies and ensure consistency. For expediency, the replication status of the record is kept in memory.
This means that when a node is cold started in a cluster, the replication status of the record on that node is lost until the state is recovered from the Appeals process to other nodes.
Un-replicated records can cause a temporary increase in read latency as each read has to wait for a write to re-replicate the record. Records in the un-replicated state will also not be shipped by XDR until re-replication is triggered.
The purpose of this article is to examine 2 methods of re-replication of records across an entire cluster.
Method
Method 1 - Scan and Touch UDFThe first method is a scan and touch UDF. Here, all records in the namespace are scanned and a simple operation, known as a touch is executed. This is enough to cause an un-replicated record to re-replicate. (full re-write of master and replica with LUT and generation updated). The TTL is set to -2 in the UDF as otherwise the TTL will be updated with the default TTL for the namespace (if TTL is leveraged, otherwise, this is not necessary).
# cat re_repl.lua function touch(rec) record.set_ttl(rec, -2) aerospike:update(rec) end aql> execute re_repl.touch() on bar OK, Scan job (6092651845346587156) created.
Method 2 - UDF Peek
The second UDF method is even more simple and does not require even touching the record. This UDF simply peeks at the record (read the metadata) , which is enough to trigger re-replication. Only the master records that are in un-replicated state would be re-replicated using this method.
cat re_repl2.lua function doNothing(r) return end aql> execute re_repl2.doNothing() on bar OK, Scan job (6173968442599032398) created.
The re-replication would update the LUT of the records (master and replicas) but not the generation, and also trigger shipping of that record through XDR.
Notes
The UDF Peek seems to be the fastest method. Both methods would trigger the record to get shipped by XDR once it has been re-written in the cluster and is no longer in the un-replicated state.
All of the methods outlined here will, of course, update the last update time of the record concerned and any bins within the record.
All these methods are based on running a basic scan on the cluster. Scans can be resource heavy and potentially disruptive to normal load. Furthermore they are a limited quantity within the system and running these scans could interfere with scans being run as part of the application. For this reason it is best to run re-replication during off peak load times.
- Also, note that there are occasions where even after doing the peek method, records can remain unreplicated : https://support.aerospike.com/s/article/Why-does-UDF-Peek-not-clear-the-unreplicated-records-state
- Unreplicated Tombstones records can also remain. These may get cleaned-up by tomb-raider cycles but not guaranteed until the blocks are overwritten. These unreplciated Tombstones records will not affect any of the cluster operations.
- If it's necessary to throttle the scan, you could try tuning the following two configurations
- background-query-max-rps (in versions prior to 6.0: background-scan-max-rps)
- query-threads-limit (in versions prior to 6.0: scan-threads-limit)