Problem Description
I see the warningcould not create heartbeat connection to node IP for a node that I no longer have in the cluster even though I issued the tip-clear command.Explanation
This happens when the heartbeat seeded nodes were specified using DNS or FQDN and the underlying IP addresses changed on the removed node or a new node was added with the same DNS name but a different IP. Since at this point Aerospike resolves DNS only once, the tip-clear does not behave as expected.
A node removal would trigger the following messages in the Aerospike server logs:
Aug 07 2019 01:45:34 GMT: WARNING (socket): (socket.c:959) (repeated:20) Error while connecting socket to 10.0.88.238:3002
Aug 07 2019 01:45:34 GMT: WARNING (hb): (hb.c:4882) (repeated:20) could not create heartbeat connection to node {10.0.88.238:3002}
In such cases, it is recommended to run tip-clear to clear the IP from the mesh mode heartbeats.
However, there can be situations when you are not able to get rid of those messages using tip-clear command, for example:
asinfo -v 'tip-clear:host-port-list=10.0.88.238:3002'
error: 0 cleared, 1 not found
And the server log have the following messages indicating the IP address does not exist:
Aug 07 2019 01:57:11 GMT: INFO (info): (thr_info.c:827) tip clear command received: params host-port-list=10.0.88.238:3002
Aug 07 2019 01:57:11 GMT: WARNING (info): (thr_info.c:883) seed node 10.0.88.238:3002 does not exist
In such situations, check whether you are using an IP or a hostname in the mesh-seed-address-port configuration. If you are using DNS, try using the DNS name in the tip-clear command:
asinfo -v 'tip-clear:host-port-list=node3.example.com:3002'
ok
However, if the server log still contains the warnings after issuing the tip-clear command, the underlying IP address behind the DNS might have changed:
Aug 07 2019 01:58:07 GMT: INFO (info): (thr_info.c:827) tip clear command received: params host-port-list=node3.example.com:3002
Aug 07 2019 01:58:07 GMT: INFO (info): (thr_info.c:900) tip clear command executed: cleared 1, params host-port-list=node3.example.com:3002
Aug 07 2019 01:58:09 GMT: WARNING (socket): (socket.c:959) (repeated:20) Error while connecting socket to 10.0.88.238:3002
Solution
For the removal of the old IP addresses in the aerospike.log, there is a workaround.
-
Temporarily add an entry under
/etc/hostson the host with warnings with the DNS name and the old IP address, and re-run tip-clear using that DNS name. -
Then, cleanup the hosts file (after successfully running the
tip-clearcommand).
Example:
- Save /etc/hosts
cp hosts hosts.orig
- Add to /etc/hosts
10.0.88.238 node3.example.com
- Then run on each node:
asinfo -v 'tip-clear:host-port-list=node3.example.com:3002'
- Restore the original /etc/hosts
cp hosts.orig hostsNotes
-
This issue may also lead to a new server receiving the same old IP address and then accidentally joining the incorrect cluster. The use of
cluster-namein /etc/aerospike.conf is recommended in order to prevent a node from joining the wrong cluster. For further details, refer to the article on How to split 2 accidentally merged clusters? . -
This issue was resolved in version 5.3:
- [AER-5852] - (CLUSTERING) tip-clear info command must specify seed as it was configured (DNS name or IP address).