Problem Description
When two clusters are connected by XDR and the source cluster is Aerospike 4.8.0.1 with a destination that is Aerospike 4.3.1.x or earlier, the following warnings can be seen in the aerospike.log.
Aug 10 2020 11:46:54 GMT: WARNING (demarshal): (thr_demarshal.c:719) as_proto decompression failed! (rv -3)
It is possible that this may also lead to a node crash.
Explanation
Aerospike 4.8.0.1 introduced support for client/server compression as part of AER-6136. When Aerospike sends compressed messages it includes a field that describes the uncompressed size of the record. In Aerospike versions 4.8.0.1 and later, this field is in correct network byte order (big-endian).
Older Aerospike servers expect this record size field to be in little-endian.
This means that when a newer source cluster ships compressed records to an older destination cluster, the size of the uncompressed record is interpreted incorrectly as unfeasibly large. This causes the rejection of the record at the destination.
It is possible that destination nodes can crash when the function within the code (cf_warning_binary) that tries to log a hex dump of the problematic message overflows the 64 KB buffer allocated for this purpose.
If this buffer overflows the server goes down with a SIGSEGV. The log will show error messages such as those shown below when this issue is happening:
Sep 23 2021 18:21:34 GMT: WARNING (demarshal): (thr_demarshal.c:719) as_proto decompression failed! (rv -3) Sep 23 2021 18:21:34 GMT: WARNING (demarshal): (thr_demarshal.c:720) compressed proto_p<HexSpaced>:02 04 cf 01 00 00 00 00 00 00 00 00 00 00 02 66 78 5e 55 91 3f 68 <...> 08 3a ca 9a 9e bc d9 48 fd 01 86 bc aa 6f Sep 23 2021 18:21:34 GMT: WARNING (demarshal): (thr_demarshal.c:719) as_proto decompression failed! (rv -3) Sep 23 2021 18:21:34 GMT: WARNING (demarshal): (thr_demarshal.c:719) as_proto decompression failed! (rv -3)
Solution
This will only happen when shipping to an older XDR destination on a server version that is no longer supported. The most robust solution is to ensure that all XDR clusters in the topology are of a currently supported variant.
A short term solution is to switch off XDR compression. In versions prior to Aerospike 5.0 this is done by setting xdr-compression-threshold to 0. The following command switches off compression.
asinfo -v 'set-config:context=xdr;xdr-compression-threshold=0'
If the source is Aerospike 5.x or higher then the parameter to modify is enable-compression
asinfo -v "set-config:context=xdr;dc=DC1;namespace=namespaceName;enable-compression=false"