Detail
What is the difference between a namespace configured for data-in-memory true and one configured for storage-engine memory?Answer
Storage Engine
Every namespace can be identified by the storage-engine it is configured with. A namespace that is configured to use a device is known as a persisted namespace, because data is either stored on disk or it is stored in a file. In either case, data survives even if the server stops unexpectedly.
In this example, data is persisted on the file /opt/aerospike/data/bar.dat or on a disk. Note that it only stores indexes in memory. All data is stored in the file.
The following example illustrates a persisted namespace on a device:
namespace bar {
replication-factor 2
memory-size 4G
default-ttl 30d # 30 days, use 0 to never expire/evict.
storage-engine device {
device /dev/sdb
data-in-memory false
}
}
For a file-backed namespace, configuration is as follows:
storage-engine device {
file /opt/aerospike/data/bar.dat
filesize 16G
data-in-memory false
}
A namespace that is configured to use storage-engine memory does not store data on disk. If the server fails, the data stored in memory is lost.
The following example shows a namespace configured to use storage-engine memory:
namespace test {
replication-factor 2
memory-size 4G
default-ttl 30d # 30 days, use 0 to never expire/evict.
storage-engine memory
}
Data-in-Memory
A namespace that is configured for data-in-memory stores all data in memory, as you would expect. The difference between data-in-memory and storage-engine memory is that a namespace with data-in-memory set to true can be configured to persist data on disk and will keep its index in shared memory during a normal asd process restart.
The following example illustrates this point:
namespace bar {
replication-factor 2
memory-size 4G
default-ttl 30d # 30 days, use 0 to never expire/evict.
storage-engine device {
file /opt/aerospike/data/bar.dat
filesize 16G
data-in-memory true # Store data in memory in addition to file.
}
}
We see that this namespace stores data in the file /opt/aerospike/data/bar.dat, and also holds the same data in memory.
In terms of system load, the difference between the two different configurations should be negligible if sized correctly. In some cases, though, depending on the system configuration and especially the swappiness and memory related tuning, persisted namespaces with data-in-memory set to true may experience occasional spikes in load and latencies corresponding to the system managing the file cache involved in storing the data on a file. Finally, namespaces in memory without persistence have an upper limit of 128MiB (whereas persisted namespaces limit the record sizes to the configured write-block-size (maximum of 8MiB).
Notes
Please be aware that the above configuration (data-in-memory true with disk persistence) should not be used as an alternative if the storage subsystem is not able of handling the write-load by it self. The below examples should help understand this better.
-
In case of workloads that do not read a record prior to updating it (pure replace or new record creations), if the storage subsystem is not capable of handling the workload (when data-in-memory is set to false), then moving to data-in-memory with the same disk as persistence will not help and will most likely lead to queue too deep issues.
-
In case of read-update workload, moving to a data-in-memory true configuration will help as the reads will be from memory rather than from the storage subsystem.
It is always a good practice to benchmark the different configurations under relevant workload prior to making a decision for a production use.
More details on the different configuration recipes can be found on the following page: https://docs.aerospike.com/server/operations/configure/namespace/storage