Detail
Why do customers see migrations occurring when the migrate-fill-delay is set?Why do some of these migrations get stuck and then clear after a period of time?
Answer
In Aerospike, themigrate-fill-delay parameter determines the amount of time a partition remains underfilled before allowing migration. This parameter is typically used in scenarios where you want to minimize the impact of migration on system performance by allowing partitions to fill up more before initiating migration. However, if you use strong consistency in your Aerospike cluster, ensuring data consistency across nodes precedes migration delay settings. In a cluster event or re-roster, partitions will migrate to the new roster master and roster replica. Only partitions owned by the roster-master and roster-replica are migrated.In some scenarios, more partitions than those owned by the roster master and roster replica are migrated, these get "stuck" and show as active migrations that are not taking place. This behaviour occurs when the cluster in question utilises a tie-breaker node that stays quiesced. During a cluster event or re-roster, the tie-breaker node would become the rightful owner for some partitions. Because it is stay-quiesced, we will now have some nodes owning partitions they shouldn’t be otherwise owning, and those now also have to migrate, as these are not roster-master or roster-replica partitions they have to wait for the migrate-fill-delay.
Notes
Defining Fill Migrations: https://aerospike.com/docs/server/operations/manage/migration/delay_migrations#defining-fill-migrationsManaging Data Consistency: https://aerospike.com/docs/server/operations/manage/consistency