Problem Description
Aerospike log file contains this warning:
WARNING (xdr): (xdr_dlog.c:568) (repeated:173276) XDR digestlog cannot keep up with writes. Dropping record.
Older versions would have the following:
WARNING (xdr): (xdr.c:5021) Ran out of queue space... XDR cannot keep up with write .. some records may be lost!!!
Explanation
This is an internal in-memory queue used to batch digest log entries before persisting them on disk. It has a size of 1,000,000 and when it is full, it will print out the above message.
Here are the three common situations that would cause this:
- The filesystem partition is full so the digest log is not able to expand as it is a sparse file. You will also see the following WARNING in the log file:
WARNING (xdr): (xdr_dlog.c:390) Digest Log Write Failed !!! ... Critical error
Older Aerospike versions have a different line number:
```
WARNING (xdr): (xdr.c:4887) Digest Log Write Failed !!! ... Critical error ```2. The disk is not keeping up so it fill the internal queue faster than it can write to the digest log. This could be due to slow disk, remote mount or simply too much data being written to a potentially shared partition.
When XDR is enabled but the remote DCs are all INACTIVE, XDR does not reclaim processed entries from the digest log, causing it to grow until it reaches its full size (and start overwriting older records). As the digest log grows, the internal logic to figure out the last ship time can takes longer, and, as it happens under a lock, it prevents new entries from being flushed to disk. If the load is such that enough digest log entries are populated in this internal queue and it reaches the limit, this WARNING message will be triggered (the ul is 1000101 in the example).
Apr 24 2017 10:00:26 GMT-0700: INFO (xdr): (xdr.c:2027) sh 0 : ul 65 : lg 7516764045 : rlg 0 : lproc 7516764000 : rproc 479081584 : lkdproc 0 : errcl 0 : errsrv 0 : hkskip 1250365174 745690834 : flat 0 Apr 24 2017 10:02:25 GMT-0700: INFO (xdr): (xdr.c:1773) Reclaimed 0 records space in digest log... Apr 24 2017 10:02:25 GMT-0700: INFO (xdr): (xdr.c:2027) sh 0 : ul 1000101 : lg 7516889145 : rlg 0 : lproc 7516889100 : rproc 479081584 : lkdproc 0 : errcl 0 : errsrv 0 : hkskip 1250365641 745691255 : flat 0
Solution
- Refer to the Solution: Digestlog Partition Out Of Space article for how to handle situations where the digestlog runs out of space.
- If the digestlog is not keeping up, note that you should NOT use direct disk based storage for
xdr-digestlog-path. The digestlog is designed to work very well on files. Keeping digestlog on the disk directly may result in the writes not keeping up. Using file-backed storage will allow for more writes to happen, because: a) the reads will be, in most cases, going directly from the filesystem in-RAM cache, and b) bursts of writes will end up in filesystem dirty-cache before being slowly flushed to disks.
If you use directly disk-backed storage, you cannot take advantage of these features. For example, if you intend to use /dev/sdb for your digestlog, first put a filesystem on /dev/sdb, and then mount the /dev/sdb somewhere (e.g. on /xdr), adding the mount to /etc/fstab. Then configure xdr-digestlog-path /xdr/digestlog.