Detail
When a node is removed from a cluster configured with the default prefer-uniform-balance setting (true), there appears to be additional migrations occurring. Is this expected?
Answer
The use of prefer-uniform-balance has major advantages as it ensures partitions are evenly distributed across nodes in the cluster. Prior to the introduction of the prefer-uniform-balance algorithm to balance the partition distribution, a variation of around 10% in the number of partitions owned by nodes would often be observed, sometimes even much higher. The optimized uniform distribution comes at a small cost: there may be additional migrations needed to keep this uniformity among nodes in a cluster. This usually comes to light when a node is permanently removed from a cluster.
In non-uniform-balance, when a node is removed, all migrations will be one-way migrations for partitions that were owned by that node. With prefer-uniform-balance, some of the adjusted partitions will cause additional one-way or two-way migrations.
Rack-aware with prefer-uniform-balance does add more migrations because the rack-aware constraint restricts which nodes can own replica copies of partitions (as rack aware prevents more than one copy of a partition to be in the same rack). This restriction required us to allow more head-room to achieve near-optimal balance.
In summary:
- With rack aware and
prefer-uniform-balancethere can be much more migrations that will occur compared to when not using those features. - In rack aware, there are adjustments being made to the partition distribution when there are about 1024 partitions remaining to assign (or balance – for non rack aware those adjustment happen when there are only 128 partitions remaining to assign).
- The worst case scenario for rack aware with
prefer-uniform-balanceand RF (replication factor) 2 is 2048 partition migrations (RF * 1024) . - In a non uniform balance scenario, there would be no migrations adjustment and migrations would only be one-way when taking down a node.
A node will not drop a partition until it has successfully migrated to another node.