Context
Split brain is a state where a cluster splits into multiple clusters of smaller sizes. This article covers the situation for Available and Strong-Consistency modes in Aerospike.Method
Here are the main symptoms of a cluster that has split:
- Check for
cluster_sizeon each node. If it is less than the expected cluster size on some nodes, it would indicate that some nodes have left the cluster or formed a sub-cluster, potentially with other nodes.
Note: A single node departing the cluster is not really considered a split brain. Even if the node is still alive, there are safe guards against the node taking ownership of all the partitions (node will switch to what is called orphan state) and clients should recognize this situation and will not submit transactions to such 'orphaned' node.
- Grep for the keywords "departed" and/or "applied cluster size" in the logs, this will indicate that a node has departed the cluster and a new cluster size has been applied.
For example:
Sep 26 2018 06:51:20 GMT: INFO (fabric): (fabric.c:2486) fabric: node bb9f2054e2ac362 departed ... Sep 26 2018 06:51:21 GMT: INFO (clustering): (clustering.c:5808) applied cluster size 2
This indicates which node departed and the effective cluster_size after that.
- Check for rebalance and migrations. Any cluster change would trigger a rebalance followed by migrations (redistribution of the partitions across the nodes in the cluster).
{ns_name} rebalanced: expected-migrations (1215,1224) expected-signals 1215 fresh-partitions 397
...
{ns_name} migrations: remaining (654,289,254) active (1,1,0) complete-pct 88.49
Or, for strong-consistency enabled namespaces (notice the unavailable_partitions statistic):
{ns_name} rebalanced: regime 295 expected-migrations (826,826) expected-signals 826 expected-appeals 0 unavailable-partitions 425
Refer to the monitoring migrations doc