Articles in this section

When Upgrading from 4.9 or 5.X to 6.X Server Why Do I See Stalled Migrations?

Problem Description

When upgrading from server version 4.9 or 5.X to 6.0+, and the namespace is a storage persisted namespace (excluding PMEM), we may potentially see migrations stalled on some nodes.

Explanation

In server 6.0, the Aerospike Database internal storage format was changed to include a four-byte record end mark. This change addresses a potential, but unlikely condition where a partial write block persisted and could have caused a record to be corrupted during an unclean shutdown. This can result in records within the persisted namespace to potentially breach the write-block-size from the extra 4 bytes which would be rejected on the receiving node.

Solution

To identify the digest(s) in question we can temporarily enable DETAIL logging for the drv_ssd context:
asinfo -v "log-set:id=SINK-ID ;drv_ssd=detail"
or in asadm:
Admin+> manage config logging file </path/to/file.log> param drv_ssd to detail

Then on the 6.X node receiving migrations we can check the logs for any repetitive digests:
# tail -f /var/log/aerospike.log | grep DETAIL
Aug 03 2023 19:53:46 GMT: DETAIL (drv_ssd): (drv_ssd.c:1549) {bar} write: size 1048580 - rejecting f59124986e96ad175b374c9487945bbcad537b74
Aug 03 2023 19:53:51 GMT: DETAIL (drv_ssd): (drv_ssd.c:1549) {bar} write: size 1048580 - rejecting f59124986e96ad175b374c9487945bbcad537b74

Once the digest is obtained, the record may need to be deleted so other records can be migrated out from the partition on 5.X nodes.
After the digest is deleted a recluster must be triggered in order to continue proceeding with migrations.
 
Admin+> manage recluster

 

Notes

Records that are too big to be migrated do NOT trigger metrics like fail_record_too_big or client_write_error, thus, if we see the metric migrate_record_retransmit incrementing steadily and fail_record_too_big does NOT match the frequency of retransmits then this may be a case of records breaching write-block-size when moving to 6.0+.
 
Admin+> show stat like record_too_big migrate_record_retransmit write_error for bar
~~~~~~~~~~~~~~bar Namespace Statistics (2023-08-03 19:59:32 UTC)~~~~~~~~~~~~~~
Node                            |172.17.0.10:3000|172.17.0.11:3000|57-1:3000
batch_sub_write_error           |--              |0               |--
client_write_error              |--              |0               |13960
fail_record_too_big             |--              |0               |4
from_proxy_batch_sub_write_error|--              |0               |--
from_proxy_write_error          |--              |0               |5341
migrate_record_retransmits      |--              |0               |4292
ops_sub_write_error             |--              |0               |0
xdr_client_write_error          |--              |0               |0
xdr_from_proxy_write_error      |--              |0               |0
Number of rows: 10
 
  • Find potential problems before upgrading
  • Clients may continue to write and possibly write a record that is too large, however, it is uncommon to see clients constantly attempting to write a key that is consistently too big. Seeing repetitive digests may indicate these records are from migrations
  • If migrate-threads=1 only one partition is migrated out at a time which can result in the same record being retransmitted effectively stalling other partitions from migrating out as we cannot proceed without removing the digest
  • The size in the log line above is the record size and not the device size (amount of space taken up on device)

Applies To Earliest Version

4.9

Applies To Latest Version

5.7
Was this article helpful?
0 out of 0 found this helpful