Context
How to troubleshoot when a client receives a Node not found for partition error.
For example, for the Java Client application the exception will look like:
Aerospike.Client.AerospikeException+InvalidNode: Error -3: Node not found for partition testnamespace:1095
The AEROSPIKE_ERR_INVALID_NODE means the client doesn’t have any node mapping for a given partition (partition ID 1095 in the namespace named testnamespace from the above example). This error is not common. It could indicate some issues (could be connectivity) preventing a client from properly ‘tending’ to all nodes in the cluster (which is how a client figures out which partition each node owns), or a node being marked as inactive by the client, as a client would drop all the partitions mapped for a node that becomes inactive. An inactive node (a node that has left a cluster and is not reported as a peer by any other node) would have the remaining nodes in the cluster take ownership of the partitions it had, and should therefore, under normal circumstances, not cause this error on the client side. It can also happen if a namespace has been removed from the cluster but the client hasn't been restarted to pick up the new list of namespaces.
The AEROSPIKE_ERR_INVALID_NODE error is mapped to error code -3 but for older client versions, it is mapped to error code -8. For the C client library, it is still mapped to -8.
In general, error codes > 0 are server generated error codes and error codes < 0 are client generated error codes. There are some exceptions like AEROSPIKE_ERR_TIMEOUT (9) which can be generated by both client and server. Server PARTITION_UNAVAILABLE (11) is equivalent to AEROSPIKE_ERR_CLUSTER (11). AEROSPIKE_ERR_INVALID_NODE (-8 or -3) and AEROSPIKE_ERR_CONNECTION (-10) are only generated by the client.
Method
1. If this is strong consistency namespace, please follow the Node Not Found For Partition using AQL with Strong Consistency article
2. Check for any known bugs in release notes.
3. Use the following commands to confirm the partition mappings for all the nodes in the cluster:
asadm -e "show pmap"
asadm -e "asinfo -v 'partition-info' -l"
4. Use the explain command in aql to determine which nodes own the partitions the record belongs to. For example:
aql> explain select * from MyNamespace.MySet where PK=12345
5. Finally, check the cluster for any node(s) restart or cluster changes in general.