Problem Description
In Aerospike logs there are messages that reference the Quantum Interval such as the one shown below.
Oct 24 2018 17:15:10 GMT: INFO (fabric): (fabric.c:2516) fabric: node a1 departed Oct 24 2018 17:15:10 GMT: INFO (fabric): (fabric.c:934) fabric_node_disconnect(a1) Oct 24 2018 17:15:11 GMT: INFO (clustering): (clustering.c:6823) dead nodes at quantum start: a1
The quantum interval adds a minimal amount of time to the cluster formation. This article explains the typical quantum interval duration and how it affect cluster reformation time.
Explanation
The Aerospike cluster protocol implements a quantum-based event detection allowing for multiple complex network changes to be merged into a single event which reduces the number of state transitions. Cluster state transitions are expensive in terms of time and resources and so it follows logically that they should be minimised where possible. The purpose of the quantum interval is to avoid reacting too quickly to node arrival and departure events and instead, process a batch of adjacent node events with a single cluster epoch change. This avoids superfluous overhead caused by redundant data re-distribution.
The system collects all clustering trigger events such as node arrivals and departures and only acts on these once every quantum interval. With the default server settings, the quantum interval is equal to 1.8 seconds though it can be adjusted as described below. The quantum interval is computed based on the configured expected network latency between cluster nodes, heartbeat interval and heartbeat timeout, so that all the effects from a single disruption such as node arrival, departure, network link faults or network partitions are seen by all nodes in the cluster before a cluster epoch change is applied.
Solution
Important Note: The following examples are provided as indication only for the theoretical scenario of a single node unexpectedly leaving a cluster without any other disruption across the rest of the cluster. This is not necessarily a likely common real scenario as disruptions in distributed systems are seldom isolated.
1) Key settings that can be tuned and influence cluster formation:
latency-max-msStatic parameter set to define network latency. Too low a value can cause excessive rebalance in the event of an unstable cluster. Default is 5ms.HBT= (heartbeat.timeout*heartbeat.interval) = 1500ms or 1.5s in default configuration.D= 2 * (heartbeat.interval+latency-max-ms). Defaults to 310ms.Q=Quantum intervalwhich ismin(5000,HBT+D). Default 1810ms, capped at 5000ms.RTT= 2 *latency-max-ms. Represents maximum round trip latency. Defaults to 10ms.C= 5*RTTfor clustering messages + 2 *RTTfor exchange (time to reform cluster). Defaults to 70ms.
2) Cluster with default configuration
-
Best case clustering reformation after a node dies
HBT + D + (Q/4) + C := 1500 + 310 + (1810/4) + 70 = 2332ms
-
Worst case clustering reformation after a node dies
HBT + D + (Q/4) + C + 1 RTT + (Q/2) := 1500 + 310 + (1810/4) + 70 + 10 + (1810/2) = 3247ms
3) Cluster with typical multi-site configuration
When considering a cluster with a higher intra-node network latency the values change significantly. The following are calculations for a cluster where latency-max-ms has been set to 70ms. This would represent a cluster whereby nodes reside on either side of the United States for example. In that scenario, heartbeat.timeout could be set to 25 and heartbeat.interval to 100ms (those values could differ based on the nature of the cluster, for example cloud vs. baremetal and whether one would prefer more frequent heartbeats with a lower tolerance for missing subsequent ones or vice versa).
-
Best case clustering reformation after a node dies
HBT + D + (Q/4) + C := 2500 + 340 + (2840/4) + 980 = 4530ms
-
Worst case clustering reformation after a node dies
HBT + D + (Q/4) + C + 1 RTT + (Q/2) := 2500 + 340 + (2840/4) + 980 + 140 + (2840/2) = 6090ms
Notes
- A rebalance will always take place following a cluster event however the
migrate-fill-delaycommand can be used to delay subsequent fill migrations if this is appropriate to the use case. Quantum intervalis inferred fromlatency-max-ms.latency-max-msis a static parameter that requires a rolling restart to change.