Context
Often, to troubleshoot issues such as imbalance in number of records across nodes, or to identify the current health of a cluster, we would like to investigate the partition map information.
Method
1.Run the following on an Aeropsike node.
asinfo -v 'partition-info' -l
2.Review the output.
This outputs in the format as following covering the status of the 4096 partitions for each namespace as seen on this node.
# asinfo -v 'partition-info' -l | headnamespace:partition:state:n_replicas:replica:n_dupl:working_master:emigrates:lead_emigrates:immigrates:records:tombstones:regime:version:final_versiontest:0:S:1:0:0:BB9020011AC4202:0:0:0:0:0:0:bc72f6219106.0.mp-:bc72f6219106.0.mp-test:1:S:1:0:0:BB9020011AC4202:0:0:0:0:0:0:bc72f6219106.0.mp-:bc72f6219106.0.mp-test:2:S:1:0:0:BB9020011AC4202:0:0:0:0:0:0:bc72f6219106.0.mp-:bc72f6219106.0.mp-test:3:S:1:0:0:BB9020011AC4202:0:0:0:0:0:0:bc72f6219106.0.mp-:bc72f6219106.0.mp-[...]bar:0:S:1:0:0:BB9020011AC4202:0:0:0:0:0:0:bc72f6219106.0.mp-:bc72f6219106.0.mp-bar:1:S:1:0:0:BB9020011AC4202:0:0:0:0:0:0:bc72f6219106.0.mp-:bc72f6219106.0.mp-bar:2:S:1:0:0:BB9020011AC4202:0:0:0:0:0:0:bc72f6219106.0.mp-:bc72f6219106.0.mp-bar:3:S:1:0:0:BB9020011AC4202:0:0:0:0:0:0:bc72f6219106.0.mp-:bc72f6219106.0.mp-[...]
Notes
- To understand the specific fields:
- namespace: Name of the namespace that the partition belongs to
- partition: The partition number (between 0 to 4095)
- state: The state of the partition - A(Absent), S(Sync), D(De-Sync), Z(Zombie), U(Unknown) - see below
- replica: The state of replica - 0 (Final Master), 1 (1st replica (possibly acting-master)), … -1 means not present on this node.
- n_dupl: If this node is acting or final master then n_dupl is the number of nodes that have a version that is considered different to our own else it is 0.
- working_master: If this node is acting-master or final-master then working_master is the node_id of the node that can take writes (aka working-master) else 0.
- emigrates: Number of outbound migrations scheduled for this partition on this node.
- immigrates: Number of inbound migrations scheduled for this partition on this node.
- records: Number of records residing on the partition.
- tombstones: Number of tombstones existing on the partition.
- version: The version of the partition. The version has little to do with ongoing migrations. The ongoing migrations were scheduled based on the prior versions. These versions will dictate scheduling in later rebalances.
- final_version: This is the version for when migrations have completed for this partition. This partition version should match after migration.
- Note: Both version and final_version should match after migrations.
- All master partitons for a namespace should have the same version and final_version after migrations
- All replica partitons for a namespace should have the same version and final_version after migrations
- State of the Partition:
-
Sync is a replica that doesn’t have any outstanding immigrations.
-
Desync is a replica that does have outstanding immigrations.
-
Absent is a non-replica and it doesn’t have any data*.
-
Zombie is a non-replica and it does have data.
-
eXcluded is a CP only - X is a non-replica but is may have lost data while it was down (such as an operator wiping its disks) and is therefore “excluded” from super majority determinations.
- *By “having data” we mean that is has a version and could have taken writes, if it didn’t take writes we still consider it as “having data”.
- Verifications and additional details
- In a cluster with RF=2 and no migrations, total number of Sync master partitions “S:0” should be equal to total number of replica partitions “S:1” = 4096.
- It is acceptable to see some De-sync state partitions if a cluster has ongoing migrations.
IMPORTANT: partition-info is considered an internal debugging command, meaning the invocation or output may be changed as of any Aerospike release without prior warning. Be extremely careful when using this command for any script or automated tool.