Articles in this section

Max FD reached on Tie-breaker

Detail

Upon starting of new added tie-breaker node to aerospike cluster, why the node reached the configured proto-fd-max?
 

Answer

 A tie-breaker node was added to the cluster, upon starting aerospike on the node, the tie-breaker node reached the configured limit of (proto-fd-max 50000).

The following error messages could be found in the log fo the tie-breaker node:
fd 361 send failed, errno 113
(security): (security.c:1986) fd 901 send failed, errno 32
mesh size recv failed fd 361: Transport endpoint is not connected
(service): (service.c:432) refusing client connection - proto-fd-max 50000
(socket.c:994) Error while getting remote name: 107 (Transport endpoint is not connected)

The errno 31 indicates that the client has closed the socket before the server could respond. Client may have timed out, has too many open connections, or there may be network issues.

The errno 107 means that connection is being closed on client side. There is no connection related warnings on other nodes in the cluster.

This would point to network layer issue for the tie-breaker node. The network issue could cause client hold connections to the tie-breaker node, the transactions could be sitting in the queue and timeout, it will also cause the node reach proto-fd-max limit.

For further troubleshooting, you can work with network team to check network connectivity between tie-breaker node to client and other nodes in the cluster.  Run netcat with the destination IP address and port number will help for testing network connectivity.
 

Notes

The Tie-Breaker node should be a machine similar to other nodes in terms of network connectivity and bandwidth. This is specially crucial if the Tie-Breaker node happens to also be the principal in the cluster. (node with highest node-id)

Applies To Earliest Version

Pre 4.9

Applies To Latest Version

Current Version
Was this article helpful?
0 out of 0 found this helpful