Detail
When a node leaves an existing cluster (eg, due to network fault), the value of CLUSTER-SIZE seen in the log may be either 0 or 1. When does each value occur?Answer
CLUSTER-SIZE 1 indicates that the node has no connectivity to any other nodes, and believes itself to be the only member of the cluster in existence. It will assume master ownership of all 4096 partitions, although only the ones owned as master or prole before it left the cluster will contain any records, and will accept all transactions (unless min-cluster-size is greater than 1). This is sometimes referred to as a "split-brain" scenario.CLUSTER-SIZE 0 indicates that the node has some connectivity to one or more of the other cluster nodes, but is not able to join the cluster. This can happen when the node has one-way connectivity loss, when it has connectivity only to a subset of the other nodes not including the paxos master, or other unusual network situations.
Notes
For more on split-brain scenarios, see How to detect if a cluster went into a split brain situation? and FAQ - What diagnostics should be collected in a split brain scenario?
A cluster size 0 issue can also occur in one of these scenarios:
- Missing address binding for fabric configuration when nodes have multiple interfaces bounded to different network. (multiple internal networks)
- TLS certificate issue between nodes over fabric.
- Network latency issue between nodes.