Problem Description
Asrestore is failing with the error “Max key-put retries exceeded (5)”
Command used for asrestore:
asrestore --host <IP Address> --directory /<directory-path> --namespace <namespace name>
Explanation
Since the asrestore output does not give any other errors except “Max key-put retries exceeded (5)” , even with the --verbose option, further debugging was enabled using a dev build to get more detailed asrestore log output.In the detailed log output, following errors can be observed:
Example asrestore output with detail logging:
2022-12-28 13:21:11 GMT [VER] [13000] Retrying unhandled write error - code -6: Socket create failed: -1 BB96B0016AC0142 172.22.0.107:3000 at src/main/aerospike/as_event_uv.c:1400 2022-12-28 13:21:11 GMT [INF] [13032] Restoring /mnt/userprofile_00348.asb 2022-12-28 13:21:11 GMT [VER] [13032] Opening backup file /mnt/userprofile_00348.asb 2022-12-28 13:21:11 GMT [VER] [13000] Retrying unhandled write error - code -6: Socket create failed: -1 BB96B0016AC0142 172.22.0.107:3000 at src/main/aerospike/as_event_uv.c:1400 2022-12-28 13:21:11 GMT [VER] [13000] Retrying unhandled write error - code -6: Socket create failed: -1 BB96B0016AC0142 172.22.0.107:3000 at src/main/aerospike/as_event_uv.c:1400 2022-12-28 13:21:11 GMT [ERR] [13032] Failed to open file /mnt/userprofile_00348.asb in "r" mode (error 24: Too many open files) 2022-12-28 13:21:11 GMT [ERR] [13032] Error while opening backup file 2022-12-28 13:21:11 GMT [VER] [13000] Retrying unhandled write error - code -6: Socket create failed: -1 BB96B0016AC0142 172.22.0.107:3000 at src/main/aerospike/as_event_uv.c:1400 2022-12-28 13:21:11 GMT [VER] [13000] Retrying unhandled write error - code -6: Socket create failed: -1 BB96B0016AC0142 172.22.0.107:3000 at src/main/aerospike/as_event_uv.c:1400The original failure “
Max key-put retries exceeded (5)” and the too many open files error are related.The linux default per process file descriptors is 1024 and a maximum of 32 * 16 async connections can be used (each one a socket) (formula is max-async-batches * batch-size if server version < 6.0, default for max-async-batches is 32 and batch-size is 16).
If the number of backup files are more, it is possible that asrestore can exceed the maximum open file descriptors. Maximum FD reached would cause the C client to fail to open new sockets, resulting in error -6 (AEROSPIKE_ERR_ASYNC_CONNECTION) which asrestore retries, resulting in the max-retry exceeded error.
Solution
To resolve this, check the per process file descriptor limit and raise it for asrestore if that is a problem.
The appropriate number for pre 6.0 servers should be number_of_backup_files + (max-async-batches * batch-size). For >= server 6.0 number_of_backup_files + max-async-batches. It is better to give a little over the result of that formula just to be safe. Or use fewer larger backup files to restore from.