Articles in this section

Why is the server no longer accepting writes?

Problem Description

This article explains scenarios which would lead to an Aerospike cluster node rejecting write operations (insert or update).

 


Explanation

The Aerospike database server has mechanisms to protect against running out of memory or disk space.
The server is designed to stop writes on the disk (and the memory) if any of the following are breached:

  • Memory utilization is above a certain threshold (stop-writes-pct).
  • Available Percentage on the disk goes below a certain threshold (min-avail-pct). Situations leading to such low available percent include:
    • Defragmentation not keeping up with the number of objects evicted.
    • Eviction is not able to keep up.
  • For strong-consistency enabled namespaces, clock_skew_stop_writes is triggered off when cluster_clock_skew_ms is above the cluster_clock_skew_stop_writes_sec threshold.
  • As of Aerospike Server 4.5.1, for each Available mode (AP) namespace where nsup is enabled (i.e. nsup-period is not zero) writes will be suspended if the cluster clock skew exceeds 40 seconds.

Solution

1.
To recover from memory utilization capping, increase the cluster capacity by increasing the memory-size if possible, increase cluster capacity by adding nodes or simply delete data (for example using the truncate method).

2.
To recover from minimum available percentage going to 0, refer to the How do I recover from Available Percent Zero?  article.

3.
To recover from evictions not keeping up, refer to the evict-tenth-pct configuration parameter. This configuration could to be tuned so it's big enough to allow eviction to keep pace with the rate at which new data is added to the namespace

 


Applies To Earliest Version

4.9

Applies To Latest Version

Current Version
Was this article helpful?
0 out of 0 found this helpful