Detail
This article discusses some queries related to unavailability of storage device and how Aerospike behaves in such situation.Answer
1. What happens to the client writes when the disk is not available for some time?>>>When there is temporary hardware failure and the storage is not available, client writes will go through and the data will be successfully written to the in-memory buffer. If data-in-memory is set to true, then the data will be successfully written to the memory as well. Disk availability will be checked when flushing these writes to the disk which will fail. Aerospike is designed to abort (SIGABRT to be precise) in case the integrity of the data is potentially compromised.
So when the writes are flushing to the disk, the server will shutdown through a SIGABRT.
While using a file storage, just deleting the file will not abort the aerospike service.
2. Will write buffer get exhausted in this case?
>>> Yes, disk unavailability can cause the write buffer to get full. Before the buffer becomes full, when the writes are getting flushed periodically as per the configuration, that will abort the aerospike service.
3. Will Aerospike be able to write data back to the disks if they are available after some time?
>>> No, all those writes which were in the in-memory buffer before the aerospike service gets aborted will be lost. If a hardware fault causes a disk to die or disappear, you should restart the application/service that was using the disk. If it was the root disk, OS should be restarted.
4. What happens when using a file storage and the file gets deleted for some reason?
IWhile using a file storage, just deleting the file will not abort the aerospike service. When the service open a file for writing and get an FD, it is essentially holding a file handle. That file is locked on the file system the service is holding a file handle to an inode number which is the actual file reference point. The file is only deleted if an inode does not reference a name and all file handles are closed.
It means that after deleting a file, it is still there as an inode while at least one FD is open. So applications can still write to the file. Once the last application closes the FD, the inode is released and file actually gets deleted. It's purely how the file system works.
It doesn't matter if it's a file or raw device, if the underlying disk of the file goes away, you get errno 5: input/output error and when the writes are flushing to the disk, the server will shutdown through a SIGABRT.
5. Similarly what happens if the folder/disk hosting the log file is unavailable for some time & appears back? Will Aerospike be able to write data back to log file?
>>> In the event that writes to server log fail due to hardware failure, the failure is ignored and Aerospike process continues to run. But for it to resume logging, you have to restart the application as that fd(file descriptor) will not be connected to the filesystem.