Articles in this section

Why is my data distribution imbalanced when I have rack aware and prefer uniform balance

Problem Description

Data is not evenly distributed throughout the nodes in the cluster even though rack aware and prefer-uniform-balance is enabled.


device_available_pct usage

~test Namespace Statistics (2023-12-19 07:11:17 GMT)~

             Node|device_available_pct

172.17.0.1:3000  |         30

172.17.0.2:3000  |         27

172.17.0.3:3000  |         27

172.17.0.4:3000  |         30

172.17.0.5:3000  |         30

172.17.0.6:3000  |         27

172.17.0.7:3000  |         30

172.17.0.8:3000  |         30

172.17.0.9:3000  |         27

172.17.0.10:3000 |         27

172.17.0.11:3000 |         27

172.17.0.12:3000 |         30

172.17.0.13:3000 |         27

172.17.0.14:3000 |         27

Number of rows: 14


Rack aware configuration

~test Namespace Configuration (2023-12-19 07:11:17 GMT)~

             Node|rack-id

172.17.0.1:3000  | 1

172.17.0.2:3000  | 2

172.17.0.3:3000  | 2

172.17.0.4:3000  | 1

172.17.0.5:3000  | 1

172.17.0.6:3000  | 3

172.17.0.7:3000  | 1

172.17.0.8:3000  | 1

172.17.0.9:3000  | 3

172.17.0.10:3000 | 4

172.17.0.11:3000 | 4

172.17.0.12:3000 | 1

172.17.0.13:3000 | 5

172.17.0.14:3000 | 5

Number of rows: 14


Explanation

Uneven data distribution could happen if racks are configured unevenly. In the example above, rack 1 has six nodes while racks 2-5 have two nodes each. When prefer-uniform-balance is enabled, the master records are distributed as evenly as possible. However the replicas would not be distributed evenly since they cannot be on the same rack as their masters; they'd have to be distributed amongst the other nodes in the other racks. From the example above, all the replicas from rack 1 would need to be distributed between racks 2-5, which would cause them to have more usage than the nodes on rack 1.


Solution

It is recommended to have the number of racks be equal to the replication-factor, unless there is a tie breaker node which can be on an additional separate rack. It is also recommended to have the same number of nodes per rack. Reconfiguring your racks to the recommended should help distribute the data more evenly between the nodes.


Applies To Earliest Version

Pre 4.9

Applies To Latest Version

Current Version
Was this article helpful?
0 out of 0 found this helpful