[SDKs & UI Libraries] Introduction. Problems and Solutions
With this post, I’m continuing publishing the v2 of my book dedicated to APIs. If you like this book, please rate it on GitHub, Amazon, or Goodreads and
Chapter 41. On the Content of This Section
As we mentioned previously, the term “SDK” (which stands for “Software Development Kit”) lacks concrete meaning. The common understanding is that an SDK differs from an API as it provides both program interfaces and tools to work with them. This definition is hardly satisfactory as today any technology is likely to be provided with a bundled toolset.
However, there is a very specific narrow definition of an SDK: it is a client library that provides a high-level interface (usually a native one) to some underlying platform (such as a client-server API). Most often, we talk about libraries for mobile OSes or Web browsers that work on top of a general-purpose HTTP API.
Among such client SDKs, one case is of particular interest to us: those libraries that not only provide programmatic interfaces to work with an API but also offer ready-to-use visual components for developers. A classic example of such an SDK is the UI libraries provided by cartographical services. Since developing a map engine, especially a vector one, is a very complex task, maps API vendors provide both “wrappers” to their HTTP APIs (such as a search function) and visual components to work with geographical entities. The latter often include general-purpose elements (such as buttons, placemarks, context menus, etc.) that can be used independently from the main functionality of the API.
This Section will be dedicated to these two types of program toolkits:
Client “wrappers” that work with client-server APIs
Client libraries that contain visual components with which end users might interact.
To avoid being wordy, we will use the term “SDK” for the former and “UI libraries” for the latter.
NB: Strictly speaking, a UI library might either include a client-server API “wrapper” or not (i.e., just provide a “pure” API to some underlying system engine). In this Section, we will mostly talk about the first option as it is the most general case and the most challenging one in terms of API design. Most SDK development patterns we will discuss are also applicable to “pure” libraries.
Chapter 42. SDKs: Problems and Solutions
The first question we need to clarify about SDKs (let us reiterate that we use this term to denote a native client library that allows for working with a technology-agnostic underlying client-server API) is why SDKs exist in the first place. In other words, why is using “wrappers” more convenient for frontend developers than working with the underlying API directly?
Several reasons are obvious:
Client-server protocols are usually designed to allow for the implementation of clients in any programming language. This implies that the data received from such an API will not be in the most convenient format. For example, there is no “datetime” type in JSON, and the dates need to be passed as strings. Similarly, most mainstream protocols don't support (de)serializing hash tables.
Most programming languages are imperative (and many of them are object-oriented) while most data formats are declarative. Working with raw data received from an API endpoint is inconvenient in terms of writing code. Client developers would prefer this data to be represented as objects (class instances).
Different programming languages imply different code styles (casing, namespace organization, etc.), while the practice of tailoring data formats in APIs to match the client's code style is very rare.
Platforms and/or programming languages usually prescribe how error handling should be organized — typically, through throwing exceptions or employing defer/panic techniques — which is hardly applicable to the concept of uniform APIs.
APIs are provided with instructions (human- or machine-readable) on how to repeat requests if the API endpoints are unavailable. This logic needs to be implemented by a client developer as client frameworks rarely provide it (and it would be very dangerous to automatically repeat potentially non-idempotent requests). Though this point might appear insignificant, it is in fact very important for every vendor of a popular API, as safeguards need to be installed to prevent API servers from overloading due to an uncontrollable spike of request repeats. This is achieved through:
Reading the
Retry-After
header and avoiding retrying the endpoint earlier than the time stated by the serverIntroducing exponentially growing intervals between consecutive requests.
This is what client developers should do regarding server errors, but they often skip this part, especially if they work for external partners.
Having an SDK would resolve these issues as they are, so to say, trivial: to fix them, the principles of working with the API aren't changed. For every request and response, we construct the corresponding SDK method, and we only need to set rules for doing this transformation, i.e., adapting platform-independent API formats to specific platforms. Additionally, this transformation usually could be automated.
However, there are also non-trivial problems we face while developing an SDK for a client-server API:
In client-server APIs, data is passed by value. To refer to some entities, specially designed identifiers need to be used. For example, if we have two sets of entities — recipes and offers — we need to build a map to understand which recipe corresponds to which offer:
// Request 'lungo' and 'latte' recipes
const recipes = await api
.getRecipes(['lungo', 'latte']);
// Build a map that allows to quickly
// find a recipe by its identifier
const recipeMap = new Map();
recipes.forEach((recipe) => {
recipeMap.set(recipe.id, recipe);
});
// Request offers for latte and lungo
// in the vicinity
const offers = await api.search({
recipes: ['lungo', 'latte'],
location
});
// To show offers to the user, we need
// to take the `recipe_id` in the offer,
// find the recipe description in the map
// and enrich the offer data with
// the recipe data
promptUser(
'What we have found',
offers.map((offer) => {
const recipe = recipeMap
.get(offer.recipe_id);
return {offer, recipe};
}));
This piece of code would be half as long if we received offers from the api.search
SDK method with a reference to a recipe:
// Request 'lungo' and 'latte' recipes
const recipes = await api
.getRecipes(['lungo', 'latte']);
// Request offers for latte and lungo
// in the vicinity
const offers = await api.search({
// Pass the references to the recipes,
// not their identifiers
recipes,
location
});
promptUser(
'What we have found',
// Offer already contains a reference
// to the recipe
offers
);
Client-server APIs are typically decomposed so that one response contains data regarding one kind of entity. Even if the endpoint is composite (i.e., allows for combining data from different sources depending on parameters), it is still the developer's responsibility to use these parameters. The code sample from the previous example would be even shorter if the SDK allowed for the initialization of all related entities:
// Request offers for latte and lungo
// in the vicinity
const offers = await api.search({
recipes: ['lungo', 'latte'],
location
});
// The SDK requested latte and lungo
// data from the `getRecipes` endpoint
// under the hood
promptUser(
'What we have found',
offers
);
The SDK can also populate program caches for the entities (if we do not rely on protocol-level caching) and/or allow for the “lazy” initialization of objects.
Generally speaking, storing pieces of data (such as authorization tokens, idempotency keys, draft identifiers in two-phase commits, etc.) between requests is the client's responsibility, and it is rather hard to formalize the rules. If an SDK takes responsibility for managing the data, there will be far fewer mistakes in application code.
Receiving callbacks in client-server APIs, even if it is a duplex communication channel, is rather inconvenient to work with and requires object mapping as well. Even if a push model is implemented, the resulting client code will be rather bulky:
// Retrieve ongoing orders
const orders = await api
.getOngoingOrders();
// Build order map
const orderMap = new Map();
orders.forEach((order) => {
orderMap.set(order.id, order);
});
// Subscribe to state change
// events
api.subscribe(
'order_state_change',
(event) => {
// Find the corresponding order
const order = orderMap
.get(event.order_id);
// Take some actions, like
// updating the UI
// in the application
UI.update(order);
}
);
If the API requires polling a state change endpoint, developers will additionally need to implement periodic requesting of the endpoint (and install safeguards to avoid overloading the server).
Meanwhile, this code sample already contains several mistakes:
First, the list of orders is requested, and then the state change listener is added. If some order changes its state between those two calls, the application would never learn about this fact.
If an event comes with an identifier of some unknown order (created from a different device or in a different execution thread), the map lookup operation will return an empty result, and the listener will throw an exception that is not properly handled anywhere.
Once again, we face a situation where an SDK lacking important features leads to mistakes in applications that use it. It would be much more convenient for a developer if an order object allowed for subscribing to its status updates without the need to learn how it works at the transport level and how to avoid missing an event.
const order = await api
.createOrder(…)
// No need to subscribe to
// the entire status change
// stream and filter it
.subscribe(
'state_change',
(event) => { … }
);
NB: This code relies on the idea that the order
is being updated in a consistent manner. Even if the state changes on the server side between the createOrder
and subscribe
calls, the order
object's state will be consistent with the state_change
events fired. Organizing this technically is the responsibility of the SDK developers.
Restoring after encountering business logic-bound errors is typically a complex procedure. As it can hardly be described in a machine-readable manner, client developers have to elaborate on the scenarios on their own.
// Request offers
const offers = await api
.search(…);
// The user selects an offer
const selectedOffer =
await promptUser(offers);
let order;
let offer = selectedOffer;
let numberOfTries = 0;
do {
// Trying to create an order
try {
numberOfTries++;
order = await api
.createOrder(offer, …);
} catch (e) {
// If the number of tries exceeded
// some reasonable limit, it's
// better to give up
if (numberOfTries > TRY_LIMIT) {
throw new NoRetriesLeftError();
}
// If the error is of the “offer
// expired” kind…
if (e.type ==
api.OfferExpiredError) {
// Trying to get a new offer
offer = await api
.renewOffer(offer);
} else {
// Other errors
…
}
}
} while (!order);
As we can see, the simple operation “try to renew an offer if needed” results in a bulky piece of code that is simultaneously error-prone and totally unnecessary as it doesn't introduce any new functionality visible to end users. For an application developer, it would be more convenient if this error (“offer expired”) were not exposed in the SDK, i.e., the SDK automatically renewed offers if needed.
Such situations also occur when working with APIs featuring eventual consistency or optimistic concurrency control — generally speaking, with any API where background errors are expected (which is rather a norm of life for client-server APIs). For frontend developers, writing code to implement policies like “read your writes” (i.e., passing tokens of the last known operation to subsequent queries) is essentially a waste of time.
Finally, one more important function that a customized SDK might fulfill is isolating the low-level API and changing the versioning paradigm. It is possible to fully conceal the underlying functionality (i.e., developers won't have direct access to the API) and ensure a certain freedom of working with the API inside the SDK up to seamlessly switching a major version. This approach for sure provides API vendors with more control over how partners' applications work. However, it requires investing more in developing the SDK and, more importantly, properly designing the SDK so that developers won't need to call the API directly due to the lack of access to some functionality or the inconvenience of working with the SDK. Moreover, the SDK itself should be robust enough to be able to handle the transition to a new major version of the API.
To summarize the above, a properly designed SDK, apart from maintaining consistency with the platform guidelines and providing “syntactic sugar,” serves three important purposes:
Lowering the number of mistakes in client code by implementing helpers that cover unobvious and poorly formalizable aspects of working with the API
Relieving client developers of the duty to write code that is absolutely irrelevant to the tasks they are solving
Giving an API vendor more control over integrations.
Code Generation
As we have seen, the list of tasks that an SDK developer faces (if they aim to create a quality product, of course) is quite considerable. Given that every target platform requires a separate SDK, it is not surprising that many API vendors seek to replace manual labor with automation.
One of the most potent approaches to such automation is code generation. It involves developing a technology that allows generating the SDK code in the target programming language for the target platform based on the API specification. Many modern data interchange protocols (gRPC, for instance) are shipped with generators of such ready-to-use clients in different programming languages. For other technologies (OpenAPI/Swagger, for instance) generators are being developed by the community.
Code generation allows for solving trivial problems such as adapting code style, (de)serializing complex types, etc. — in other words, resolving issues that are bound to the specifics of the platform, not the high-level business logic. The SDK developers can relatively inexpensively complement this machine “translation” with convenient helpers: realize automated repeating idempotent requests (with some retry time management policy), caching results, storing persistent data (such as authorization tokens) in the system storage, etc.
Such a generated SDK is usually referred to as a “client to an API.” The convenience of usage and the functional capabilities of code generation are so formidable that many API vendors restrict themselves to this technology, only providing their SDKs as generated clients.
However, for the aforementioned reasons, higher-level problems (such as receiving callbacks, dealing with business logic-bound errors, etc.) cannot be solved with code generation without writing some arbitrary code for the specific API. In the case of complex APIs with a non-trivial workcycle it is highly desirable that an SDK also solves high-level problems. Otherwise, an API vendor will end up with a bunch of applications using the API and making all the same “rookie mistakes.” This is, of course, not a reason to fully abolish code generation as it's quite convenient to use a generated client as a basis for developing a high-level SDK.
This is Chapters 41 and 42 of “The API” book being written by Sergey Konstantinov. I also have a book on the history of beer and historical beer styles, a Telegram channel on interesting classical music recordings, a travel photo blog on Unsplash, and a website with ranking fantasy & science fiction novels based on awards they received.