[The API Patterns] Eventual Consistency

Apr 28, 2023

With this post, I’m continuing publishing the v2 of my book dedicated to APIs. If you like this book, please rate it on GitHub, Amazon, or Goodreads and

Chapter 18. Eventual Consistency

The approach described in the previous chapter is in fact a trade-off: the API performance issues are traded for “normal” (i.e., expected) background errors that happen while working with the API. This is achieved by isolating the component responsible for controlling concurrency and only exposing read-only tokens in the public API. Still, the achievable throughput of the API is limited, and the only way of scaling it up is removing the strict consistency from the external API and thus allowing reading system state from read-only replicas:

// Reading the state,
// possibly from a replica
const orderState = 
  await api.getOrderState();
const version = 
  orderState.latestVersion;
try {
  // The request handler will
  // read the actual version
  // from the master data
  const task = await api
    .createOrder(version, …);
} catch (e) {
  …
}

As orders are created much more rarely than read, we might significantly increase the system performance if we drop the requirement of returning the most recent state of the resource from the state retrieval endpoints. The versioning will help us avoid possible problems: creating an order will still be impossible unless the client has the actual version. In fact, we transited to the eventual consistency model: the client will be able to fulfill its request sometime when it finally gets the actual data. In modern microservice architectures, eventual consistency is rather an industrial standard, and it might be close to impossible to achieve the opposite, i.e., strict consistency.

NB: let us stress that you might choose the approach only in the case of exposing new APIs. If you're already providing an endpoint implementing some consistency model, you can't just lower the consistency level (for instance, introduce eventual consistency instead of the strict one) even if you never documented the behavior. This will be discussed in detail in the “On the Waterline of the Iceberg” chapter of “The Backwards Compatibility” section of this book.

Choosing weak consistency instead of a strict one, however, brings some disadvantages. For instance, we might require partners to wait until they get the actual resource state to make changes — but it is quite unobvious for partners (and actually inconvenient) they must be prepared to wait for changes they made themselves to propagate.

// Creates an order
const api = await api
  .createOrder(…)
// Returns a list of orders
const pendingOrders = await api.
  getOngoingOrders(); // → []
  // The list is empty

If strict consistency is not guaranteed, the second call might easily return an empty result as it reads data from a replica, and the newest order might not have hit it yet.

An important pattern that helps in this situation is implementing the “read-your-writes” model, i.e., guaranteeing that clients observe the changes they have just made. The consistency might be lifted to the read-your-writes level by making clients pass some token that describes the last changes known to the client.

const order = await api
  .createOrder(…);
const pendingOrders = await api.
  getOngoingOrders({
    …,
    // Pass the identifier of the
    // last operation made by
    // the client
    last_known_order_id: order.id
  })

Such a token might be:

an identifier (or identifiers) of the last modifying operations carried out by the client;
the last known resource version (modification date, ETag) known to the client.

Upon getting the token, the server must check that the response (e.g., the list of ongoing operations it returns) matches the token, i.e., the eventual consistency converged. If it did not (the client passed the modification date / version / last order id newer than the one known to the server), one of the following policies or their combinations might be applied:

the server might repeat the request to the underlying DB or to the other kind of data storage in order to get the newest version (eventually);
the server might return an error that requires the client to try again later;
the server queries the main node of the DB, if such a thing exists, or otherwise initiates retrieving the master data.

The advantage of this approach is client development convenience (compared to the absence of any guarantees): by preserving the version token, client developers get rid of the possible inconsistency of the data got from API endpoints. There are two disadvantages, however:

it is still a trade-off between system scalability and a constant inflow of background errors;
- if you're querying master data or repeating the request upon the version mismatch, the load on the master storage is increased in poorly a predictable manner;
- if you return a client error instead, the number of such errors might be considerable, and partners will need to write some additional code to deal with the errors;
this approach is still probabilistic, and will only help in a limited number of use cases (to be discussed below).

There is also an important question regarding the default behavior of the server if no version token was passed. Theoretically, in this case, master data should be returned, as the absence of the token might be the result of an app crash and subsequent restart or corrupted data storage. However, this implies an additional load on the master node.

Evaluating the Risks of Switching to Eventual Consistency

Let us state an important assertion: the methods of solving architectural problems we're discussing in this section are probabilistic. Abolishing strict consistency means that, even if all components of the system work perfectly, client errors will still occur — and we may only try to lessen their numbers for typical usage profiles.

NB: the “typical usage profile” stipulation is important: an API implies the variability of client scenarios, and API usage cases might fall into several groups, each featuring quite different error profiles. The classical example is client APIs (where it's an end user who makes actions and waits for results) versus server APIs (where the execution time is per se not so important — but let's say mass parallel execution might be). If this happens, it's a strong signal to make a family of API products covering different usage scenarios, as we will discuss in “The API Services Range” chapter of “The API Product” section of this book.

Let's return to the coffee example, and imagine we implemented the following scheme:

optimistic concurrency control (through, let's say, the id of the last user's order)
the “read-your-writes” policy of reading the order list (again with passing the last known order id as a token)
retrieving master data in the case the token is absent.

In this case, the order creation error might only happen in one of the two cases:

the client works with the data incorrectly (does not preserve the identifier of the last order or the idempotency key while repeating the request)
the client tries to create an order from two different instances of the app that do not share the common state.

The first case means there is a bug in the partner's code; the second case means that the user is deliberately testing the system's stability — which is hardly a frequent case (or, let's say, the user's phone went off and they quickly switched to a tablet — rather rare case as well, we must admit).

Let's now imagine that we dropped the third requirement — i.e., returning the master data if the token was not provided by the client. We would get the third case when the client gets an error:

the client application lost some data (restarted or corrupted), and the user tries to replicate the last request.

NB: the repeated request might happen without any automation involved if, let's say, the user got bored of waiting, killed the app and manually re-orders the coffee again.

Mathematically, the probability of getting the error is expressed quite simply. It's the ratio between two durations: the time period needed to get the actual state to the time period needed to restart the app and repeat the request. (Keep in mind that the last failed request might be automatically repeated on startup by the client.) The former depends on the technical properties of the system (for instance, on the replication latency, i.e., the lag between the master and its read-only copies) while the latter depends on what client is repeating the call.

If we talk about applications for end users, the typical restart time there is measured in seconds, which normally should be much less than the overall replication latency. Therefore, client errors will only occur in case of data replication problems / network issues / server overload.

If, however, we talk about server-to-server applications, the situation is totally different: if a server repeats the request after a restart (let's say because the process was killed by a supervisor), it's typically a millisecond-scale delay. And that means that the number of order creation errors will be significant.

As a conclusion, returning eventually consistent data by default is only viable if an API vendor is either ready to live with background errors or capable of making the lag of getting the actual state much less than the typical app restart time.

This is Chapter 18 of “The API” book being written by Sergey Konstantinov. I also have a book on the history of beer and historical beer styles, a Telegram channel on interesting classical music recordings, and a travel photo blog on Unsplash.

Discussion about this post

Ready for more?