Articles in this section

Why does an AllFlash configured node crashes with 'Too Many Chunks' ?

Problem Description

An Aerospike node with the primary index on disk shuts down with the following message shown in the log.
Oct 07 2021 12:51:23 GMT: CRITICAL (arenax): (arenax_ee.c:98) too many chunks

Oct 07 2021 12:51:23 GMT: WARNING (as): (signal.c:218) SIGUSR1 received, aborting Aerospike Enterprise Edition build 5.6.0.8 os el7

Oct 07 2021 12:51:23 GMT: WARNING (as): (log.c:630) stacktrace: registers: rax 0000000000000000 rbx 000000000000000a rcx 00007f4983521690 rdx 0000000000000000 rsi 00007f49719f9fb0 rdi 0000000000000002 rbp 0000000000000064 rsp 00007f49719f9fb0 r8 0000000000000000 r9 00007f49719f9fb0 r10 0000000000000008 r11 0000000000000246 r12 0000000000001926 r13 00007f46bbd67808 r14 00007f4971a00800 r15 000000000bcea660 rip 00007f4983521690

Oct 07 2021 12:51:23 GMT: WARNING (as): (log.c:643) stacktrace: found 11 frames: 0x6862a1 0x4ef4fb 0x7f49835217e0 0x7f4983521690 0x685a67 0x666ae7 0x4abdbc 0x4abe18 0x6746a7 0x7f498351740b 0x7f49820c50bf offset 0x0




 

Explanation

This error will occur when there has been a serious misconfiguration of partition-tree-sprigs on the all-flash namespace.

sprig is a branch of the primary index. When the index is held in RAM the number of branches is largely unimportant until the index becomes very large. Increasing the number of sprigs increases index efficiency at the expense of memory consumption.

When the index is on disk (all-flash) the number of sprigs becomes much more important. Indexes consist of 4 KiB disk blocks (chunks) and, ideally, a single record lookup should consist of 1 I/O and no more.

When sizing all-flash installations, care is taken to estimate the size of the index required and to calculate the right number of partition-tree-sprigs such that all index entries are stored at the desired Fill Fraction (see below) and that each sprig uses a single 4 KiB chunk and no more.

This is not enforced, there is no limit in the code to the number of chunks per sprig. It is possible, though deeply inadvisable, to have sprigs consist many 4 KiB chunks. This means that for each record lookup there would be, potentially, hundreds of I/O operations, which would be extremely detrimental to performance. The system will allocate as many chunks as it needs to store the records that are loaded in. The number of chunks is not directly configurable, the number of partition-tree-sprigs is used as an indirect control.

The error above occurs when a partition is dropped where the sprigs have more than 100 chunks. When the chunks are cleaned up for re-use there is a sanity check and if the sprig has more than 100 chunks it is assumed that the sprig is corrupt and the node shuts itself down.


Solution

If this error is observed, it is indicative of a major misconfiguration. The sizing should be reviewed carefully, if need be with an Aerospike Solutions Architect before nodes are restarted,

Notes

  • The Fill Fraction defines the level to which a sprig is filled to allow for some expansion without overfilling and consuming more than one chunk per sprig
  • The Linux Capacity Planning Guide gives details on how to size all-flash installations correctly.

Applies To Earliest Version

5.0

Applies To Latest Version

Current Version
Was this article helpful?
0 out of 0 found this helpful