Problem Description
One of the metrics that may be useful to monitor (depending on the use case) is the one tracking errors received during client writes (client_write_error). There are different reasons that could cause this metric to increment.Explanation
The client_write_error will increment based on the write failure that occurred on the server.
You can monitor those errors, and use your preferred monitoring plug-in, or use asadm tools to check on a per-namespace basis: client_write_error is a cumulative statistic, so one would need to check the counts over time. Note that this does not include the transactions that timed out on the server.
Solution
If you see the above metric increasing, it can be one or more of the following reasons:
The implied solution would involve either a change in client policy or handling of the cause described by the error code.
1. Some possibilities which are accompanied by error specific statistic change:
- If the key fails on generation check.: fail_generation
- If the key requested to be written is a hot-key: fail_key_busy
- If key size is bigger than write-block-size: fail_record_too_big
- If key written is from XDR when
allow-xdr-writesconfiguration is false or if key is from non-XDR client whenallow-non-xdr-writesconfiguration is false: fail_xdr_forbidden
2. Some possibilities which would be accompanied by server log warnings:
- If the drives are full.
- If an invalid-ttl is specified.
- If the storage is overloaded (would see "queue too deep" warnings in log).
- Policy-related problems with the incoming transaction or problems writing to storage that would trigger log warnings prefaced by " {namespace} write_master:" along with explanatory text about the exact problem.
- Attempting to create a record in a set whose name is too long.
- Attempting to create a record in a set, but the message has a missing or mismatched set name.
- Attempting a durable delete with the Community Edition server.
- If there are issues accessing a stored key.
3. Some other possibilities which would not be accompanied by server log warnings:
- If the namespace is under stop-writes.
- During an update/replace operation, but the record is either not found or found but expired or truncated.
- If create-only is specified, but record already exists.
- When creating a record in a set, but stop writes are in effect on the set.
- If creating a record that would be immediately truncated.
- Attempting to write in a set currently being deleted (for server versions earlier than 3.14).
- If a touch operation is performed and if that record doesn't exist.
Notes
In all of these instances, a corresponding return code would be sent back to the client.
Example:Java list of client error codes can be found on this Github page: ResultCode.java