Context
If you are running your cluster with replication-factor equal to the number of nodes in your cluster (such that all nodes hold a copy of all data), and your client policy allows reading from replica nodes, then special consideration must be taken when quiescing a nodeIn order to understand the issue, we first need to look how Aerospike determines which nodes hold data. The Aerospike partition distribution algorithm creates something called a succession list for each individual partition in the cluster. This list is a list of nodes in the cluster, and is algorithmically computed for each partition. The cluster then takes the first x nodes in this list (where x is your chosen replication factor) to determine which nodes hold data for this partition
When you quiesce a node, this node is moved to the end of the succession list for all partitions. Assuming you are not storing a copy of data on all nodes, this moves the quiesced node outside of the first x nodes in the list, meaning it is no longer active for any partitions. In the scenario where you are running replication factor equal to the number of nodes in your cluster, moving a node to the end of the succession list does not result in it moving outside of the list of active nodes. This means it will still be serving client traffic should the client policies allow a client to send data to the node
Method
In order to work around this scenario, when quiescing a node you must also reduce the replication factor, to move the node outside of the active window. The process would look like this:Quiesce the node:
Admin+> manage quiesce with 10.0.3.224
\~\~\~Quiesce Nodes\~\~\~\~
Node|Response
node1:3000|ok
node2:3000|ok
node3:3000|ok
node4:3000|ok
node5:3000|ok
Number of rows: 5
Reduce the replication factor:
Admin+> manage config namespace test param replication-factor to 4
~Set Namespace Param replication-factor to 4~
Node|Response
node1:3000|ok
node2:3000|ok
node3:3000|ok
node4:3000|ok
node5:3000|ok
Number of rows: 5
Finally, issue the recluster to make the above 2 setting take effect at the same time:
Admin+> manage recluster Successfully started recluster
Notes
When adding the node back to the cluster, remember to revert the replication-factor tooDynamic configuration of replication-factor is only available in AP namespaces and from Aerospike Server 6.0 onwards. Prior to this version, a cluster restart was required