[The API Patterns] Partial Updates. Degradation and Predictability

May 26, 2023

With this post, I’m continuing publishing the v2 of my book dedicated to APIs. The Section II “The API Patterns” with these two chapters is finished. If you like this book, please rate it on GitHub, Amazon, or Goodreads and

Chapter 24. Partial Updates

The case of partial application of the list of changes described in the previous chapter naturally leads us to the next typical API design problem. What if the operation involves a low-level overwriting of several data fields rather than an atomic idempotent procedure (as in the case of changing the order status)? Let's take a look at the following example:

// Creates an order
// consisting of two beverages
POST /v1/orders/
X-Idempotency-Token: <token>
{
  "delivery_address",
  "items": [{
    "recipe": "lungo"
  }, {
    "recipe": "latte",
    "milk_type": "oat"
  }]
}
→
{ "order_id" }

// Partially updates the order
// by changing the volume
// of the second beverage
PATCH /v1/orders/{id}
{
  "items": [
    // `null` indicates
    // no changes for the
    // first beverage
    null,
    // list of properties
    // to change for
    // the second beverage
    {"volume": "800ml"}
  ]
}
→
{ /* Changes accepted */ }

This signature is inherently flawed as its readability is dubious. What does the empty first element in the array mean, deletion of an element or absence of changes? What will happen with fields that are not passed (delivery_address, milk_type)? Will they reset to default values or remain unchanged?

The most notorious thing here is that no matter which option you choose, your problems have just begun. Let's say we agree that the "items":[null, {…}]} construct means the first array element remains untouched. So how do we delete it if needed? Do we invent another “nullish” value specifically to denote removal? The same issue applies to field values: if skipping a field in a request means it should remain unchanged, then how do we reset it to the default value?

Partially updating a resource is one of the most frequent tasks that API developers have to solve, and unfortunately, it is also one of the most complicated. Attempts to take shortcuts and simplify the implementation often lead to numerous problems in the future.

A trivial solution is to always overwrite the requested entity completely, which means requiring the passing of the entire object to fully replace the current state and return the new one. However, this simple solution is frequently dismissed due to several reasons:

Increased request sizes and, consequently, higher traffic consumption
The necessity to detect which fields were actually changed in order to generate proper signals (events) for change listeners
The inability to facilitate collaborative editing of the object, meaning allowing two clients to edit different properties of the object in parallel as clients send the full object state as they know it and overwrite each other's changes as they are unaware of them.

To avoid these issues, developers sometimes implement a naïve solution:

Clients only pass the fields that have changed
To reset the values of certain fields and to delete or skip array elements some “special” values are used.

A full example of an API implementing the naïve approach would look like this:

// Partially rewrites the order:
//   * resets delivery address
//     to the default values
//   * leaves the first beverage
//     intact
//   * removes the second beverage
PATCH /v1/orders/{id}
{
  // “Special” value #1:
  // reset the field
  "delivery_address": null
  "items": [
    // “Special” value #2:
    // do nothing to the entity
    {}, 
    // “Special” value #3:
    // delete an entity
    false
  ]
}

This solution allegedly solves the aforementioned problems:

Traffic consumption is reduced as only the changed fields are transmitted, and unchanged entities are fully omitted (in our case, replaced with the special value {}).
Notifications regarding state changes will only be generated for the fields and entities passed in the request.
If two clients edit different fields, no access conflict is generated and both sets of changes are applied.

However, upon closer examination all these conclusions seem less viable:

We have already described the reasons for increased traffic consumption (excessive polling, lack of pagination and/or field size restrictions) in the “Describing Final Interfaces” chapter, and these issues have nothing to do with passing extra fields (and if they do, it implies that a separate endpoint for “heavy” data is needed).
The concept of passing only the fields that have actually changed shifts the burden of detecting which fields have changed onto the client developers' shoulders:
- Not only does the complexity of implementing the comparison algorithm remain unchanged but we also run the risk of having several independent realizations.
- The capability of the client to calculate these diffs doesn't relieve the server developers of the duty to do the same as client developers might make mistakes or overlook certain aspects.
Finally, the naïve approach of organizing collaborative editing by allowing conflicting operations to be carried out if they don't touch the same fields works only if the changes are transitive. In our case, they are not: the result of simultaneously removing the first element in the list and editing the second one depends on the execution order.
- Often, developers try to reduce the outgoing traffic volume as well by returning an empty server response for modifying operations. Therefore, two clients editing the same entity do not see the changes made by each other until they explicitly refresh the state, which further increases the chance of yielding highly unexpected results.

A more consistent solution is to split an endpoint into several idempotent sub-endpoints, each having its own independent identifier and/or address (which is usually enough to ensure the transitivity of independent operations). This approach aligns well with the decomposition principle we discussed in the “Isolating Responsibility Areas” chapter.

// Creates an order
// comprising two beverages
POST /v1/orders/
{
  "parameters": {
    "delivery_address"
  },
  "items": [{
    "recipe": "lungo"
  }, {
    "recipe": "latte",
    "milk_type": "oats"
  }]
}
→
{
  "order_id", 
  "created_at",
  "parameters": {
    "delivery_address"
  },
  "items": [
    { "item_id", "status"}, 
    { "item_id", "status"}
  ]
}

// Changes the parameters
// of the second order
PUT /v1/orders/{id}/parameters
{ "delivery_address" }
→
{ "delivery_address" }

// Partially changes the order
// by rewriting the parameters
// of the second beverage
PUT /v1/orders/{id}/items/{item_id}
{ 
  // All the fields are passed,
  // even if only one has changed
  "recipe", "volume", "milk_type" 
}
→
{ "recipe", "volume", "milk_type" }

// Deletes one of the beverages
DELETE /v1/orders/{id}/items/{item_id}

Now to reset the volume field it is enough not to pass it in the PUT items/{item_id}. Also note that the operations of removing one beverage and editing another one became transitive.

This approach also allows for separating read-only and calculated fields (such as created_at and status) from the editable ones without creating ambivalent situations (such as what should happen if the client tries to modify the created_at field).

Applying this pattern is typically sufficient for most APIs that manipulate composite entities. However, it comes with a price as it sets high standards for designing the decomposed interfaces (otherwise a once neat API will crumble with further API expansion) and the necessity to make many requests to replace a significant subset of the entity's fields (which implies exposing the functionality of applying bulk changes, the undesirability of which we discussed in the previous chapter).

NB: while decomposing endpoints, it's tempting to split editable and read-only data. Then the latter might be cached for a long time and there will be no need for sophisticated list iteration techniques. The plan looks great on paper; however, with API expansion, immutable data often ceases to be immutable which is only solvable by creating new versions of the interfaces. We recommend explicitly pronouncing some data non-modifiable in one of the following two cases: either (1) it really cannot become editable without breaking backward compatibility or (2) the reference to the resource (such as, let's say, a link to an image) is fetched via the API itself and you can make these links persistent (i.e., if the image is updated, a new link is generated instead of overwriting the content the old one points to).

Resolving Conflicts of Collaborative Editing

The idea of applying changes to a resource state through independent atomic idempotent operations looks attractive as a conflict resolution technique as well. As subcomponents of the resource are fully overwritten, it is guaranteed that the result of applying the changes will be exactly what the user saw on the screen of their device, even if they had observed an outdated version of the resource. However, this approach helps very little if we need a high granularity of data editing as it's implemented in modern services for collaborative document editing and version control systems (as we will need to implement endpoints with the same level of granularity, literally one for each symbol in the document).

To make true collaborative editing possible, a specifically designed format for describing changes needs to be implemented. It must allow for:

ensuring the maximum granularity (each operation corresponds to one distinct user's action)
implementing conflict resolution policies.

In our case, we might take this direction:

POST /v1/order/changes
X-Idempotency-Token: <token>
{
  // The revision the client
  // observed when making
  // the changes
  "known_revision",
  "changes": [{
    "type": "set",
    "field": "delivery_address",
    "value": <new value>
  }, {
    "type": "unset_item_field",
    "item_id",
    "field": "volume"
  }],
  …
}

This approach is much more complex to implement, but it is the only viable technique for realizing collaborative editing as it explicitly reflects the exact actions the client applied to an entity. Having the changes in this format also allows for organizing offline editing with accumulating changes on the client side for the server to resolve the conflict later based on the revision history.

NB: one approach to this task is developing a set of operations in which all actions are transitive (i.e., the final state of the entity does not change regardless of the order in which the changes were applied). One example of such a nomenclature is CRDT. However, we consider this approach viable only in some subject areas, as in real life, non-transitive changes are always possible. If one user entered new text in the document and another user removed the document completely, there is no way to automatically resolve this conflict that would satisfy both users. The only correct way of resolving this conflict is explicitly asking users which option for mitigating the issue they prefer.

Chapter 25. Degradation and Predictability

In the previous chapters, we repeatedly discussed that the background level of errors is not just unavoidable, but in many cases, APIs are deliberately designed to tolerate errors to make the system more scalable and predictable.

But let's ask ourselves a question: what does a “more predictable system” mean? For an API vendor, the answer is simple: the distribution and number of errors are both indicators of technical problems (if the numbers are growing unexpectedly) and KPIs for technical refactoring (if the numbers are decreasing after the release).

However, for partner developers, the concept of “API predictability” means something completely different: how solidly they can cover the API use cases (both happy and unhappy paths) in their code. In other words, how well one can understand based on the documentation and the nomenclature of API methods what errors might arise during the API work cycle and how to handle them.

Why is optimistic concurrency control better than acquiring locks from the partner's point of view? Because if the revision conflict error is received, it's obvious to a developer what to do about it: update the state and try again (the easiest approach is to show the new state to the end user and ask them what to do next). But if the developer can't acquire a lock in a reasonable time then… what useful action can they take? Retrying most certainly won't change anything. Show something to the user… but what exactly? An endless spinner? Ask the user to make a decision — give up or wait a bit longer?

While designing the API behavior, it's extremely important to imagine yourself in the partner developer's shoes and consider the code they must write to solve the arising issues (including timeouts and backend unavailability). This book comprises many specific tips on typical problems; however, you need to think about atypical ones on your own.

Here are some general pieces of advice that might come in handy:

If you can include recommendations on resolving the error in the error response itself, do it unconditionally (but keep in mind there should be two sets of recommendations, one for the user who will see the message in the application and one for the developer who will find it in the logs)
If errors emitted by some endpoint are not critical for the main functionality of the integration, explicitly describe this fact in the documentation. Developers may not guess to wrap the corresponding code in a try-catch block. Providing code samples and guidance on what default value or behavior to use in case of an error is even better.
Remember that no matter how exquisite and comprehensive your error nomenclature is, a developer can always encounter a transport-level error or a network timeout, which means they need to restore the application state when the tips from the backend are not available. There should be an obvious default sequence of steps to handle unknown problems.
Finally, when introducing new types of errors, don't forget about old clients that are unaware of these new errors. The aforementioned “default reaction” to obscure issues should cover these new scenarios.

In an ideal world, to help partners “degrade properly,” a meta-API should exist, allowing for determining the status of the endpoints of the main API. This way, partners would be able to automatically enable fallbacks if some functionality is unavailable. In the real world, alas, if a widespread outage occurs, APIs for checking the status of APIs are commonly unavailable as well.

This are Chapter 24 and Chapter 25 of “The API” book being written by Sergey Konstantinov. I also have a book on the history of beer and historical beer styles, a Telegram channel on interesting classical music recordings, a travel photo blog on Unsplash, and a website with ranking fantasy & science fiction novels based on awards they received.

Discussion about this post

Ready for more?