Articles in this section

Outbound migration stalls when max-num-incoming set to 0

Problem Description

When clusters are migrating, partitions being migrated out are not dropped on the node of origin until the receiving node has signaled that it has a full copy of the partition.  This means that there is a temporary increase in resource usage during migrations, which may cause problems.  In this situation, a common technique is to shut down incoming migrations by setting migrate-max-num-incoming to 0.  This stops the flow of incoming partitions and allows the node to migrate out first, preserving space.

In some situations, it has been observed that outbound migrations then stall.  

Explanation

In certain cluster states, a rebalance may happen whereby a partition has both a new master and a new replica node.  In this circumstance, the migration mechanism means that the partition in question must migrate into the master node and from there migrate out to the new replica.  If migrate-max-num-incoming is set to 0 the inbound partition cannot migrate into the master node and therefore cannot be migrated onto the new replica node meaning that migrations appear to stall.  

The migrations have not stalled, they have been artificially halted.

Solution

To get the outbound migrations moving, inbound migrations must be allowed to happen.  If resources are an issue then migrate-max-num-incoming can be set to a lower figure, such as 1.

Applies To Earliest Version

Pre 4.9

Applies To Latest Version

Current Version
Was this article helpful?
0 out of 0 found this helpful