Articles in this section

How to Recover Shared Memory (shmem) Usage with Rolling Cold Start

Context

To recover shared memory usage, nodes must perform a rolling cold start, which rebuilds the primary index from storage.

⚠️ Important: You may not want to cold start if:

  • You do not handle resurrection of deleted objects.

  • Your cluster is already low on RAM, which may increase the risk of OOM-kill.

For more details, please refer to this KB article:
How do I speed up cold-start eviction?


Method

  1. Run this in asadm (in Enable Mode) to apply for all nodes:

    asinfo -v "set-config:context=service;migrate-threads=0"

  2. Quiesce that node to transfer partitions ownership gracefully to other nodes

    manage quiesce with xxx.xxx.xxx.xxx. (IP Address of the node)
    manage recluster

  3. Shut down Aerospike on the node

  4. Trigger cold start, for example (See notes below on other methods) by running:

    asd-coldstart

  5. After the node re-joins the cluster, restore migration:

    asinfo -v "set-config:context=service;migrate-threads=1"

  6. Wait for migration to complete.

  7. Repeat all the steps for the next node.

Notes

  • Using ipcrm to remove shmem segments can also trigger cold start.  This KB article has an example of how to use ipcrm:
    How to change data-size config in a running cluster.

  • Changing certain configuration parameters will also trigger a cold start. For example, updating partition-tree-sprigs. This may be a good opportunity to apply such changes if you haven’t already.

  • Setting "cold-start-empty true" in aerospike.conf if the cluster is stable—that is, when all data is fully replicated with RF ≥ 2. In this case, the node can rejoin empty and the missing data will be filled by other nodes.

  • Remember, cold starting a node an result in resurrection of deleted data or data where the expiration of already existing records was lowered.

Applies To Earliest Version

Pre 4.9

Applies To Latest Version

Current Version
Was this article helpful?
0 out of 0 found this helpful