Articles in this section

How to split 2 accidentally merged clusters?

Problem Description

Two clusters configured for mesh heartbeats may have accidentally joined into one if the IP of a node in one cluster ends up on the  list of seed nodes of the other and both clusters do not have a cluster-name set. This can happen due to a mis-configuration in aerospike.conf, or a user error in dynamically seeding using the tip info command, or reuse of an IP address that formerly belonged to a node in one cluster for a node in the other. This article describes how to separate the clusters again.

WARNING: These instructions only apply when the two original clusters did not have any namespace names in common. If they do share a namespace, data loss may occur. Please see notes section for details on a such case.

WARNING: These instructions only apply when the two merged clusters use mesh heartbeats. If they use multicast heartbeats, Aerospike Enterprise customers should contact Aerospike Support


Explanation

There are two methods of separating the clusters, at the networking level by blocking inter-node communication with iptables, or at the application level by setting different cluster-name values on the nodes. We recommend the latter, when it fits the use case.

 


    Solution

    1. Preliminary steps

    • Check the config files of all nodes from both the clusters and delete any mesh-seed-address-port entry that points to nodes from the other cluster. (Nodes from the mesh seed list of cluster A should not refer to nodes from cluster B and vice versa). This will prevent the clusters from rejoining the next time nodes are started.

    • Use tip-clear to dynamically remove the unneeded nodes from every node's mesh seed list. For example:

    asinfo -v 'tip-clear:host-port-list=172.3.0.1:3002,172.3.0.2:3002, ...'
    
    • This command must include the mesh seed IP address and port for every node in the other cluster, not just the ones in the config file (run on each node of cluster A with all addresses for cluster B, and then on each node of cluster B with all addresses for cluster A). Please review release notes if you are using DNS in your mesh seed list.

    • Verify that the new configuration is correct by executing the following command on each node and looking for the Heartbeat Dump information in the aerospike.log file:

    asinfo -v 'dump-hb:verbose=true'
    
    • Each node should only have the addresses of the other nodes in its cluster; if it has any nodes from the other cluster, modify and rerun the tip-clear command until the clusters no longer have references to each other in "HB Mesh Nodes".

     

    2. Splitting a cluster using iptables method

    • Using the same list of cluster B's IP address, run these commands on every node of cluster A, and then run them on every node of cluster B with cluster A's IP addresses. Note that the list after "SOURCEIP in" is separated by spaces, and does not include the port numbers, unlike the list passed to tip-clear.
    for SOURCEIP in 172.3.0.1 172.3.0.2 ... ; do
        for PORT in 3001 3002 ; do
        iptables -I INPUT -p tcp -m tcp --source ${SOURCEIP} --dport ${PORT} -j REJECT --reject-with icmp-host-unreachable
        iptables -I OUTPUT -p tcp -m tcp --destination ${SOURCEIP} --dport ${PORT} -j REJECT --reject-with icmp-host-unreachable
        done
    done
    • After a few seconds, the clusters should now be separate. We recommend setting cluster-name on all nodes of both clusters to prevent this from happening again.

     

    3. Splitting a cluster using cluster-name method (cluster-name):

    • Choose names for the two clusters. In these instructions, we will use "ClusterA" and "ClusterB", but any two names are fine. On every node of cluster A, execute the command
    asinfo -v 'set-config:context=service;cluster-name=ClusterA'
    • Then, on every node of cluster B
    asinfo -v 'set-config:context=service;cluster-name=ClusterB'
    • After a few seconds, the clusters should now be separate. Add the cluster-name settings to aerospike.conf to make them persistent and prevent this from happening again.

    Notes

    • When setting a cluster-name on an Aerospike server node, the cluster-name will be returned to the client. If the client has a cluster-name set that doesn't match, the transaction will fail. If the client does have a matching cluster-name or if it doesn't have any cluster-name set, the transaction will proceed.
    • tip-clear may not work as expected on version 5.2 and below. Issue has been resolved in [AER-5852]-- (CLUSTERING) tip-clear info command must specify seed as it was configured (DNS name or IP address).
    • It may be necessary to restart all the clients in order to force them to connect to the relevant clusters.
    • When next deploying your client code, note that you can set the cluster name within the client policy. This will ensure that the client will only ever connect to the intended cluster, confirming its name.
    • If the namespaces share the same name across clusters, the data will automatically merge through migrations. At this stage, splitting the cluster would mean that you would have to either recover data from a backup, or retain the merged data in both clusters. In such a special case you would need to divide the clusters by rack to retain the same data on both clusters. Configure rack-id to ensure that nodes from cluster 1 are in rack 1 and nodes from cluster 2 are in rack 2. This way, you are ensuring that the data from the merged namespace will all be available on both clusters once the split happens. Do this in the aerospike.conf files first, followed by the dynamic change. This is to ensure that should a node restart, it will end up in the correct rack.

    Applies To Earliest Version

    Pre 4.9

    Applies To Latest Version

    Current Version
    Was this article helpful?
    0 out of 0 found this helpful