Problem Description
A node has ran out of file descriptors and is getting the following warnings:
Nov 14 2017 00:54:57 GMT: WARNING (demarshal): (thr_demarshal.c:454) Hit OS file descriptor limit (EMFILE on accept). Consider raising limit for uid 0 ... Nov 14 2015 03:41:56 GMT: WARNING (cf:socket): (socket.c::274) socket: Too many open files
Explanation
By default aerospike init script or service configuration will set the maximum number of open files to 100000. (ulimit -n 100000) This per process maximum number of files descriptors can be adjusted statically or dynamically. Its possible that the configured limit is set to low for the current environment and workload.Solution
1. Modifying the Max Number of File Descriptors for asd statically
if for some reason the Soft and hard limit are smaller than expected, you may need to manually set this value by modifying /etc/security/limits.conf
Setting Open Files limit for all users
- Edit the following file: /etc/security/limits.conf
In the following example we are setting the wild card * for any user to 100000 files.
* hard nofile 100000
* soft nofile 100000
Note: You may need to logout and login again after having set this setting in the file for it to take affect.
Setting Open Files limit for asd process Non-Systemd environments
- Edit the following file /etc/init.d/aerospike
Modify the following line and add the maximum number file descriptors for asd process.
ulimit -n 100000
Setting Open Files limit for asd process Systemd environments
- Create a file named override.conf under /etc/systemd/system/aerospike.service.d/
With the following lines:
[Service]
LimitNOFILE=<MAX NUMBER OF FILE DESCRIPTORS>
- example file:
root@myserver:~# cat /etc/systemd/system/aerospike.service.d/override.conf
[Service]
LimitNOFILE=200000
- Reload systemd daemon
systemctl daemon-reload
- Restart the aerospike service
systemctl restart aerospike.service
2. Modifying the Max Number of File Descriptors for asd dynamically
- modifying nofile with prlimit for version 5 and above:
prlimit --pid `pgrep asd` --nofile=200000
- Verify setting:
prlimit --pid 5967
RESOURCE DESCRIPTION SOFT HARD UNITS
AS address space limit unlimited unlimited bytes
CORE max core file size 0 unlimited bytes
CPU CPU time unlimited unlimited seconds
DATA max data size unlimited unlimited bytes
FSIZE max file size unlimited unlimited bytes
LOCKS max number of file locks held unlimited unlimited locks
MEMLOCK max locked-in-memory address space 65536 65536 bytes
MSGQUEUE max bytes in POSIX mqueues 819200 819200 bytes
NICE max nice prio allowed to raise 0 0
NOFILE max number of open files 200000 200000 files
NPROC max number of processes 63329 63329 processes
RSS max resident set size unlimited unlimited bytes
RTPRIO max real-time priority 0 0
RTTIME timeout for real-time tasks unlimited unlimited microsecs
SIGPENDING max number of pending signals 63329 63329 signals
STACK max stack size 8388608 unlimited bytes
3. Verification of current values
The current max open files setting can be verified on a running Aerospike server by reading the following file /proc/XXX/limits (where XXX is the pid of the running aerospike process)
root@u15:~# cat /proc/`pgrep asd`/limits
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 256380 256380 processes
Max open files 100000 100000 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 256380 256380 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
Notes
It is possible that this system setting change may fail due to permission issues.
- Always set ulimit before starting Aerospike service but you should be able to adjust the maximum number of open files dynamically afterward using the
prlimitcommand. - Never use prlimit to increase the maximum number of open files dynamically for the asd process prior to version 5.0.0.3 (which is the version where AER-6218 was addressed).
- Always set proto-fd-max to a value lower than it as open files include non-client related file descriptors.
Always ensure that proto-fd-max configuration setting has a value that is less than the system/process limit. Depending on the size of cluster and configuration, it should be hundreds to thousands smaller.
In older versions of the Aerospike database, dynamically increasing the process system limit after the asd process has started and increasing the proto-fd-max configuration to a higher value than the original process system limit can lead to the following assertion:
FAILED ASSERTION (service): (service.c:294) cannot get free slot
This has been addressed in 2 steps:
-
Versions prior to 5.0 will not allow raising the
proto-fd-maxconfiguration higher than the original process system limit (check the release notes for exact versions that have the improvement tracked under AER-6211):- [AER-6211] - (KVS) Dynamically increasing both the system file descriptor limit and service context configuration item 'proto-fd-max' to a value larger than the system limit at startup may cause an assert.
-
Versions 5.0 and above allow dynamically increasing the process system limit (through the
prlimitlinux command) and then increasingproto-fd-max. That improvement was tracked under:- [AER-6218] - (KVS) Service context configuration item 'proto-fd-max' may now be dynamically set as high as the system's current process file descriptor limit.