Articles in this section

How long does it take a cluster to reform after a single node event?

Problem Description

In Aerospike logs there are messages that reference the Quantum Interval such as the one shown below.

Oct 24 2018 17:15:10 GMT: INFO (fabric): (fabric.c:2516) fabric: node a1 departed
Oct 24 2018 17:15:10 GMT: INFO (fabric): (fabric.c:934) fabric_node_disconnect(a1)
Oct 24 2018 17:15:11 GMT: INFO (clustering): (clustering.c:6823) dead nodes at quantum start: a1

The quantum interval adds a minimal amount of time to the cluster formation. This article explains the typical quantum interval duration and how it affect cluster reformation time.


Explanation

The Aerospike cluster protocol implements a quantum-based event detection allowing for multiple complex network changes to be merged into a single event which reduces the number of state transitions. Cluster state transitions are expensive in terms of time and resources and so it follows logically that they should be minimised where possible. The purpose of the quantum interval is to avoid reacting too quickly to node arrival and departure events and instead, process a batch of adjacent node events with a single cluster epoch change. This avoids superfluous overhead caused by redundant data re-distribution.

The system collects all clustering trigger events such as node arrivals and departures and only acts on these once every quantum interval. With the default server settings, the quantum interval is equal to 1.8 seconds though it can be adjusted as described below. The quantum interval is computed based on the configured expected network latency between cluster nodes, heartbeat interval and heartbeat timeout, so that all the effects from a single disruption such as node arrival, departure, network link faults or network partitions are seen by all nodes in the cluster before a cluster epoch change is applied.


Solution

Important Note: The following examples are provided as indication only for the theoretical scenario of a single node unexpectedly leaving a cluster without any other disruption across the rest of the cluster. This is not necessarily a likely common real scenario as disruptions in distributed systems are seldom isolated.

1) Key settings that can be tuned and influence cluster formation:

  • latency-max-ms Static parameter set to define network latency. Too low a value can cause excessive rebalance in the event of an unstable cluster. Default is 5ms.
  • HBT= (heartbeat.timeout * heartbeat.interval) = 1500ms or 1.5s in default configuration.
  • D = 2 * (heartbeat.interval + latency-max-ms). Defaults to 310ms.
  • Q = Quantum interval which is min(5000,HBT + D). Default 1810ms, capped at 5000ms.
  • RTT = 2 * latency-max-ms. Represents maximum round trip latency. Defaults to 10ms.
  • C = 5*RTT for clustering messages + 2 * RTT for exchange (time to reform cluster). Defaults to 70ms.

2) Cluster with default configuration

  • Best case clustering reformation after a node dies

HBT + D + (Q/4) + C := 1500 + 310 + (1810/4) + 70 = 2332ms

  • Worst case clustering reformation after a node dies

HBT + D + (Q/4) + C + 1 RTT + (Q/2) := 1500 + 310 + (1810/4) + 70 + 10 + (1810/2) = 3247ms

3) Cluster with typical multi-site configuration

When considering a cluster with a higher intra-node network latency the values change significantly. The following are calculations for a cluster where latency-max-ms has been set to 70ms. This would represent a cluster whereby nodes reside on either side of the United States for example. In that scenario, heartbeat.timeout could be set to 25 and heartbeat.interval to 100ms (those values could differ based on the nature of the cluster, for example cloud vs. baremetal and whether one would prefer more frequent heartbeats with a lower tolerance for missing subsequent ones or vice versa).

  • Best case clustering reformation after a node dies

HBT + D + (Q/4) + C := 2500 + 340 + (2840/4) + 980 = 4530ms

  • Worst case clustering reformation after a node dies

HBT + D + (Q/4) + C + 1 RTT + (Q/2) := 2500 + 340 + (2840/4) + 980 + 140 + (2840/2) = 6090ms


Notes

  • A rebalance will always take place following a cluster event however the migrate-fill-delay command can be used to delay subsequent fill migrations if this is appropriate to the use case.
  • Quantum interval is inferred from latency-max-ms.
  • latency-max-ms is a static parameter that requires a rolling restart to change.

Applies To Earliest Version

Pre 4.9

Applies To Latest Version

Current Version
Was this article helpful?
0 out of 0 found this helpful