Articles in this section

Transaction failed due to conflict with XDR

Problem Description

A stand alone which cluster which is neither an XDR source nor an XDR destination can throw this exception. The exception in itself looks mis-leading as there is no XDR involved here. 
This exception is ERROR-28 AS_ERR_LOST_CONFLICT, which indicates that the client write lost to a conflicting write by XDR. This can happen when bin convergence is on and XDR wrote a later version of the record compared to the client write.

There is one more case where this can happen even when there is no XDR involved. 
 

Explanation

Here are the conditions for this to happen in a standalone cluster.  There is an internal mechanism when conflict-resolve-writes is set to true and XDR is configured to ship to a destination, then we internally introduce a delay for the writes which come in the same millisecond on the source so that the LUT's do not collide. 

If conflict-resolve-writes is set to true and the namespace is SC, but XDR is not configured, then we do not introduce this delay, and let the writes go through, which is when we see this error. 

This is specific to SC only, if the namespace is AP with rest of the parameters as it is, one would not see this. 

In short, these are the points. 
 
  • In SC, if an update comes in and the LUT we would see is earlier than the LUT of the existing record, we just wait and we wouldn’t let it through so, we re-queue internally until the current time is equal or greater than the time of the previous record’s LUT (or we time out if we wait too long).

  • If the namespace was configured to ship via XDR, we would wait and not ship if the LUT of the bins we would be shipping match the previously existing LUT.

  • In this case, we don’t have XDR configured, so we would let records go through with the same LUT as previously for the bins.

If we end up trying to write LUT for a bin with the same as previously, we fail if conflict-resolve-writes is turned on.

So, this would only happen if XDR is not configured to ship, while running in SC and having conflict-resolve-writes turned on and even without clock going back simply having two writes at the same millisecond.

 


Solution

In this case, the solution is to simple set the conflict-resolve-writes to false as it is no required in a standalone cluster. 

Notes

This was introduced in 5.4.0.1. And was fixed in 6.1.0.25, 6.2.0.20, 6.3.0.13 and 6.4.0.7.

Applies To Earliest Version

5.4
Was this article helpful?
0 out of 0 found this helpful