[HTTP APIs & REST] Concept and Terminology. The REST Myth. Components of a HTTP Request
With this post, I’m continuing publishing the v2 of my book dedicated to APIs. If you like this book, please rate it on GitHub, Amazon, or Goodreads and
Chapter 33. On the HTTP API Concept and Terminology
The problem of designing HTTP APIs is unfortunately one of the most “holywar”-inspiring issues. On one hand, it is one of the most popular technologies but, on the other hand, it is quite complex and difficult to comprehend due to the large and fragmented standard split into many RFCs. As a result, the HTTP specification is doomed to be poorly understood and imperfectly interpreted by millions of software engineers and thousands of textbook writers. Therefore, before proceeding to the useful part of this Section, we must clarify exactly what we are going to discuss.
It has somehow happened that the entire modern network stack used for developing client-server APIs has been unified in two important points. One of them is the Internet Protocol Suite, which comprises the IP protocol as a base and an additional layer on top of it in the form of either the TCP or UDP protocol. Today, alternatives to the TCP/IP stack are used for a very limited subset of engineering tasks.
However, from a practical standpoint, there is a significant inconvenience that makes using raw TCP/IP protocols much less practical. They operate over IP addresses which are poorly suited for organizing distributed systems:
Firstly, humans are not adept at remembering IP addresses and prefer readable names
Secondly, an IP address is a technical entity bound to a specific network node while developers require the ability to add or modify nodes without having to modify the code of their applications.
The domain name system, which allows for assigning human-readable aliases to IP addresses, has proved to be a convenient abstraction with almost universal adoption. Introducing domain names necessitated the development of new protocols at a higher level than TCP/IP. For text (hypertext) data this protocol happened to be HTTP 0.9 developed by Tim Berners-Lee and published in 1991. Besides enabling the use of network node names, HTTP also provided another useful abstraction: assigning separate addresses to endpoints working on the same network node.
Initially, the protocol was very simple and merely described a method of retrieving a document by establishing a TCP/IP connection to the server and passing a string in the GET document_address
format. Subsequently, the protocol was enhanced by the URL standard for document addresses. After that, the protocol evolved rapidly: new verbs, response statuses, headers, data types, and other features emerged in a short time.
HTTP was developed to transfer hypertext which poorly fits for developing program interfaces. However, loose HTML quickly evolved into strict and machine-readable XML, which became one of the most widespread standards for describing API calls. Starting from the 2000s, XML was gradually replaced by much simpler and interoperable JSON. Today, when we talk about HTTP APIs, we usually refer to interfaces for transmitting data and executing remote calls in JSON format over the HTTP protocol.
On one hand, HTTP was a simple and easily understandable protocol to make arbitrary calls to remote servers using their domain names. On the other hand, it quickly gained a wide range of extensions beyond its base functionality. Eventually, HTTP became another “attractor” where all the network technology stacks converge. Most API calls within TCP/IP networks are made through the HTTP protocol. However, unlike the TCP/IP case, it is each developer's own choice which parts of the functionality provided by the HTTP protocol and its numerous extensions they are going to use. For example, gRPC and GraphQL work on top of HTTP but employ a limited subset of its capabilities.
However, the term “HTTP API” is not always a synonym for “any API that utilizes the HTTP protocol.” When we refer to HTTP APIs, we rather imply it is used not as a third additional quasi-transport layer protocol (as it happens in the case of gRPC and GraphQL) but as an application-level protocol, meaning its components (such as URL, headers, HTTP verbs, status codes, caching policies, etc.) are used according to their respective semantics. We also likely imply that some textual data format (JSON or XML) is used to describe procedure calls.
In this Section, we will discuss client-server APIs with the following properties:
The interaction protocol is HTTP version 1.1 or higher
The data format is JSON (excluding endpoints specifically designed to provide data in other formats, usually files)
The endpoints (resources) are identified by their URLs in accordance with the standard
The semantics of HTTP calls match the specification
None of the Web standards are intentionally violated.
We will refer to such APIs as “HTTP APIs” or “JSON-over-HTTP APIs.” We understand that this is a loose interpretation of the term, but we prefer to live with that rather than using the phrase “JSON-over-HTTP endpoints utilizing the semantics described in the HTTP and URL standards” each time.
Chapter 34. The REST Myth
Before we proceed to discuss HTTP API design patterns, we feel obliged to clarify one more important terminological issue. Often, an API matching the description we gave in the previous chapter is called a “REST API” or a “RESTful API.” In this Section, we don't use any of these terms as it makes no practical sense.
What is “REST”? In 2000, Roy Fielding, one of the authors of the HTTP and URI specifications, published his doctoral dissertation titled “Architectural Styles and the Design of Network-based Software Architectures,” the fifth chapter of which was named “Representational State Transfer (REST).”
As anyone can attest by reading this chapter, it features a very much abstract overview of a distributed client-server architecture that is not bound to either HTTP or URL. Furthermore, it does not discuss any API design recommendations. In this chapter, Fielding methodically enumerates restrictions that any software engineer encounters when developing distributed client-server software. Here they are:
The client and the server do not know how each of them is implemented
Sessions are stored on the client (the “stateless” constraint)
Data must be marked as cacheable or non-cacheable
Interaction interfaces between system components must be uniform
Network-based systems are layered, meaning every server may just be a proxy to another server
The functionality of the client might be enhanced by the server providing code on demand.
That's it. With this, the REST definition is over. Fielding further concretizes some implementation aspects of systems under the stated restrictions. However, all these clarifications are no less abstract. Literally, the key abstraction for the REST architectural style is “resource”; any data that can have a name may be a resource.
The key conclusion that we might draw from the Fielding-2000 definition of REST is, generally speaking, that any networking software in the world complies with the REST constraints. The exceptions are very rare.
Consider the following:
It is very hard to imagine any system that does not feature any level of uniformity of inter-component communication as it would be impossible to develop such a system. Ultimately, as we mentioned in the previous chapter, almost all network interactions are based on the IP protocol, which is a uniform interface.
If there is a uniform communication interface, it can be mimicked if needed, so the requirement of client and server implementation independence can always be met.
If we can create an alternative server, it means we can always have a layered architecture by placing an additional proxy between the client and the server.
As clients are computational machines, they always store some state and cache some data.
Finally, the code-on-demand requirement is a sly one as in a von Neumann architecture, we can always say that the data the client receives actually comprises instructions in some formal language.
Yes, of course, the reasoning above is a sophism, a reduction to absurdity. Ironically, we might take the opposite path to absurdity by proclaiming that REST constraints are never met. For instance, the code-on-demand requirement obviously contradicts the requirement of having an independently-implemented client and server as the client must be able to interpret the instructions the server sends written in a specific language. As for the “S” rule (i.e., the “stateless” constraint), it is very hard to find a system that does not store any client context as it's close to impossible to make anything useful for the client in this case. (And, by the way, Fielding explicitly requires that: “communication … cannot take advantage of any stored context on the server.”)
Finally, in 2008, Fielding himself increased the entropy in the understanding of the concept by issuing a clarification explaining what he actually meant. In this article, among other things, he stated that:
REST API development must focus on describing media types representing resources
The client must be agnostic of these media types
There must not be fixed resource names and operations with resources. Clients must extract this information from the server's responses.
The concept of “Fielding-2008 REST” implies that clients, after somehow obtaining an entry point to the API, must be able to communicate with the server having no prior knowledge of the API and definitely must not contain any specific code to work with the API. This requirement is much stricter than the ones described in the dissertation of 2000. Particularly, REST-2008 implies that there are no fixed URL templates; actual URLs to perform operations with the resource are included in the resource representation (this concept is known as HATEOAS). The dissertation of 2000 does not contain any definitions of “hypermedia” that contradict the idea of constructing such links based on the prior knowledge of the API (such as a specification).
NB: leaving out the fact that Fielding rather loosely interpreted his own dissertation, let us point out that no system in the world complies with the Fielding-2008 definition of REST.
We have no idea why, out of all the overviews of abstract network-based software architecture, Fielding's concept gained such popularity. It is obvious that Fielding's theory, reflected in the minds of millions of software developers, became a genuine engineering subculture. By reducing the REST idea to the HTTP protocol and the URL standard, the chimera of a “RESTful API” was born, of which nobody knows the definition.
Do we want to say that REST is a meaningful concept? Definitely not. We only aimed to explain that it allows for quite a broad range of interpretations, which is simultaneously its main power and its main weakness.
On one hand, thanks to the multitude of interpretations, the API developers have built a perhaps vague but useful view of “proper” HTTP API architecture. On the other hand, the lack of concrete definitions has made REST API one of the most “holywar”-inspiring topics, and these holywars are usually quite meaningless as the popular REST concept has nothing to do with the REST described in Fielding's dissertation (and even more so, with the REST described in Fielding's manifesto of 2008).
The terms “REST architectural style” and its derivative “REST API” will not be used in the following chapters since it makes no practical sense as we explained above. We referred to the constraints described by Fielding many times in the previous chapters because, let us emphasize it once more, it is impossible to develop distributed client-server APIs without taking them into account. However, HTTP APIs (meaning JSON-over-HTTP endpoints utilizing the semantics described in the HTTP and URL standards) as we will describe them in the following chapter align well with the “average” understanding of “REST / RESTful API” as per numerous tutorials on the Web.
Chapter 35. Components of an HTTP Request and Their Semantics
The third important exercise we must conduct is to describe the format of an HTTP request and response and explain the basic concepts. Many of these may seem obvious to the reader. However, the situation is that even the basic knowledge we require to move further is scattered across vast and fragmented documentation, causing even experienced developers to struggle with some nuances. Below, we will try to compile a structured overview that is sufficient to design HTTP APIs.
To describe the semantics and formats, we will refer to the brand-new RFC 9110, which replaces no fewer than nine previous specifications dealing with different aspects of the technology. However, a significant volume of additional functionality is still covered by separate standards. In particular, the HTTP caching principles are described in the standalone RFC 9111, while the popular PATCH
method is omitted in the main RFC and is regulated by RFC 5789.
An HTTP request consists of (1) applying a specific verb to a URL, stating (2) the protocol version, (3) additional meta-information in headers, and (4) optionally, some content (request body):
POST /v1/orders HTTP/1.1
Host: our-api-host.tld
Content-Type: application/json
{
"coffee_machine_id": 123,
"currency_code": "MNT",
"price": "10.23",
"recipe": "lungo",
"offer_id": 321,
"volume": "800ml"
}
An HTTP response to such a request includes (1) the protocol version, (2) a status code with a corresponding message, (3) response headers, and (4) optionally, response content (body):
HTTP/1.1 201 Created
Location: /v1/orders/123
Content-Type: application/json
{
"id": 123
}
NB: in HTTP/2 (and future HTTP/3), separate binary frames are used for headers and data instead of the holistic text format. However, this doesn't affect the architectural concepts we will describe below. To avoid ambiguity, we will provide examples in the HTTP/1.1 format. You can find detailed information about the HTTP/2 format here.
1. A URL
A Uniform Resource Locator (URL) is an addressing unit in an HTTP API. Some evangelists of the technology even use the term “URL space” as a synonym for “The World Wide Web.” It is expected that a proper HTTP API should employ an addressing system that is as granular as the subject area itself; in other words, each entity that the API can manipulate should have its own URL.
The URL format is governed by a separate standard developed by an independent body known as the Web Hypertext Application Technology Working Group (WHATWG). The concepts of URLs and Uniform Resource Names (URNs) together constitute a more general entity called Uniform Resource Identifiers (URIs). (The difference between the two is that a URL allows for locating a resource within the framework of some protocol whereas a URN is an “internal” entity name that does not provide information on how to find the resource.)
URLs are decomposed into sub-components, each of which is optional:
A scheme: a protocol to access the resource (in our case it is always
https:
)A host: a top-level address unit in the form of either a domain name or an IP address. A host might contain subdomains.
A port.
A path: a URL part between the host (including port) and the
?
or#
symbols or the end of the line.The path itself is usually decomposed into parts using the
/
symbol as a delimiter. However, the standard does not define any semantics for it.Two paths, one ending with
/
and one without it (for example,/root/leaf
and/root/leaf/
), are considered different paths according to the standard. Conversely, two URLs that differ only in trailing slashes in their paths are considered different as well. However, we are unaware of a single argument to differentiate such URLs in practice.Paths may contain
.
and..
parts, which are supposed to be interpreted similarly to analogous symbols in file paths (meaning that/root/leaf
,/root/./leaf
, and/root/branch/../leaf
are equivalent).
A query: a URL part between the
?
symbol and either#
or the end of the line.A query is usually decomposed into
key=value
pairs split by the&
character. Again, the standard does not require this or define the semantics.Nor does the standard imply any normalization of the ordering. URLs that differ only in the order of keys in the queries are considered different.
A fragment (also known as an anchor): a part of a URL that follows the
#
sign.Fragments are usually treated as addresses within the requested document and because of that are often omitted by user agents while executing the request.
Two URLs that only differ in fragment parts may be considered equal or not, depending on the context.
In HTTP requests, the scheme, host, and port are usually (but not always) omitted and presumed to be equal to the connection parameters. (Fielding actually names this arrangement one of the biggest flaws in the protocol design.)
NB: the standard also enumerates some legacy components such as logins and passwords in URLs or non-UTF encoding marks, which we consider irrelevant to the topic of API design. Additionally, the standard contains rules for serializing, normalizing, and comparing URLs, knowing which can be useful for an HTTP API developer.
2. Headers
Headers contain metadata associated with a request or a response. They might describe properties of entities being passed (e.g., Content-Length
), provide additional information regarding a client or a server (e.g., User-Agent
, Date
, etc.) or simply contain additional fields that are not directly related to the request/response semantics (such as Authorization
).
The important feature of headers is the possibility to read them before the message body is fully transmitted. This allows for altering request or response handling depending on the headers, and it is perfectly fine to manipulate headers while proxying requests. Many network agents actually do this, i.e., add, remove, or modify headers while proxying requests. In particular, modern web browsers automatically add a number of technical headers, such as User-Agent
, Origin
, Accept-Language
, Connection
, Referer
, Sec-Fetch-*
, etc., and modern server software automatically adds or modifies such headers as X-Powered-By
, Date
, Content-Length
, Content-Encoding
, X-Forwarded-For
, etc.
This freedom in manipulating headers can result in unexpected problems if an API uses them to transmit data as the field names developed by an API vendor can accidentally overlap with existing conventional headers, or worse, such a collision might occur in the future at any moment. To avoid this issue, the practice of adding the prefix X-
to custom header names was frequently used in the past. More than ten years ago this practice was officially discouraged (see the detailed overview in RFC 6648). Nevertheless, the prefix has not been fully abolished, and many semi-standard headers still contain it (notably, X-Forwarded-For
). Therefore, using the X-
prefix reduces the probability of collision but does not eliminate it. The same RFC reasonably suggests using the API vendor name as a prefix instead of X-
. (We would rather recommend using both, i.e., sticking to the X-ApiName-FieldName
format. Here X-
is included for readability [to distinguish standard fields from custom ones], and the company or API name part helps avoid collisions with other non-standard header names).
Additionally, headers are used as control flow instructions for so-called “content negotiation,” which allows the client and server to agree on a response format (through Accept*
headers) and to perform conditional requests that aim to reduce traffic by skipping response bodies, either fully or partially (through If-*
headers, such as If-Range
, If-Modified-Since
, etc.)
3. HTTP Verbs
One important component of an HTTP request is a method (verb) that describes the operation being applied to a resource. RFC 9110 standardizes eight verbs — namely, GET
, POST
, PUT
, DELETE
, HEAD
, CONNECT
, OPTIONS
, and TRACE
— of which we as API developers are interested in the former four. The CONNECT
, OPTIONS
, and TRACE
methods are technical and rarely used in HTTP APIs (except for OPTIONS
, which needs to be implemented to ensure access to the API from a web browser). Theoretically, the HEAD
verb, which allows for requesting resource metadata only, might be quite useful in API design. However, for reasons unknown to us, it did not take root in this capacity.
Apart from RFC 9110, many other specifications propose additional HTTP verbs, such as COPY
, LOCK
, SEARCH
, etc. — the full list can be found in the registry. However, only one of them gained widespread popularity — the PATCH
method. The reasons for this state of affairs are quite trivial: the five methods (GET
, POST
, PUT
, DELETE
, and PATCH
) are enough for almost any API.
HTTP verbs define two important characteristics of an HTTP call:
Semantics: what the operation means
Side effects:
Whether the request modifies any resource state or if it is safe (and therefore, could it be cached)
Whether the request is idempotent or not.
The common HTTP API methods are:
GET: Returns a representation of a resource. Safe, idempotent, cannot have a body
PUT: Replaces (fully overwrites) a resource with a provided entity. Unsafe, idemponent, has a body
DELETE: Deletes a resource. Unsafe, idemponent, cannot have a body
POST: Processes a provided entity according to its internal semantics. Unsafe, non-idemponent, can have a body
PATCH: Modifies (partially overwrites) a resource with a provided entity. Unsafe, non-idemponent, can have a body.
The most important property of modifying idempotent verbs is that the URL serves as an idempotency key for the request. The PUT /url
operation fully overwrites a resource, so repeating the request won't change the resource. Conversely, retrying a DELETE /url
request must leave the system in the same state where the /url
resource is deleted. Regarding the GET /url
method, it must semantically return the representation of the same target resource /url
. If it exists, its implementation must be consistent with prior PUT
/ DELETE
operations. If the resource was overwritten via PUT /url
, a subsequent GET /url
call must return a representation that matches the entity enclosed in the PUT /url
request. In the case of JSON-over-HTTP APIs, this simply means that GET /url
returns the same data as what was passed in the preceding PUT /url
, possibly normalized and equipped with default values. On the other hand, a DELETE /url
call must remove the resource, resulting in subsequent GET /url
requests returning a 404
or 410
error.
The idempotency and symmetry of the GET
/ PUT
/ DELETE
methods imply that neither GET
nor DELETE
can have a body as no reasonable meaning could be associated with it. However, most web server software allows these methods to have bodies and transmits them further to the endpoint handler, likely because many software engineers are unaware of the semantics of the verbs (although we strongly discourage relying on this behavior).
For obvious reasons, responses to modifying endpoints are not cached (though there are some conditions to use a response to a POST
request as cached data for subsequent GET
requests). This ensures that repeating POST
/ PUT
/ DELETE
/ PATCH
requests will hit the server as no intermediary agent can respond with a cached result. In the case of a GET
request, it is generally not true. Only the presence of no-store
or no-cache
directives in the response guarantees that the subsequent GET
request will reach the server.
One of the most widespread HTTP API design antipatterns is violating the semantics of HTTP verbs:
Placing modifying operations in a
GET
handler. This can lead to the following problems:Interim agents might respond to such a request using a cached value if a required caching directive is missing, or vice versa, automatically repeat a request upon receiving a network timeout.
Some agents consider themselves eligible to traverse hyper-references (i.e., making HTTP
GET
requests) without the explicit user's consent. For example, social networks and messengers perform such calls to generate a preview for a link when a user tries to share it.
Placing non-idempotent operations in
PUT
/DELETE
handlers. Although interim agents do not typically repeat modifying requests regardless of their alleged idempotency, a client or server framework can easily do so. This mistake is often coupled with requiring passing a body alongside aDELETE
request to discern the specific object that needs to be deleted, which per se is a problem as any interim agent might discard such a body.Ignoring the
GET
/PUT
/DELETE
symmetry requirement. This can manifest in different ways, such as:Making a
GET /url
operation return data even after a successfulDELETE /url
callMaking a
PUT /url
operation take the identifiers of the entities to modify from the request body instead of the URL, resulting in theGET /url
operation's inability to return a representation of the entity passed to thePUT /url
handler.
4. Status Codes
A status code is a machine-readable three-digit number that describes the outcome of an HTTP request. There are five groups of status codes:
1xx
codes are informational. Among these, the100 Continue
code is probably the only one that is commonly used.2xx
codes indicate that the operation was successful.3xx
codes are redirection codes, implying that additional actions must be taken to consider the operation fully successful.4xx
codes represent client errors5xx
codes represent server errors.
NB: the separation of codes into groups by the first digit is of practical importance. If the client is unaware of the meaning of an xyz
code returned by the server, it must conduct actions as if an x00
code was received.
The idea behind status codes is obviously to make errors machine-readable so that all interim agents can detect what has happened with a request. The HTTP status code nomenclature effectively describes nearly every problem applicable to an HTTP request, such as invalid Accept-*
header values, missing Content-Length
, unsupported HTTP verbs, excessively long URIs, etc.
Unfortunately, the HTTP status code nomenclature is not well-suited for describing errors in business logic. To return machine-readable errors related to the semantics of the operation, it is necessary either to use status codes unconventionally (i.e., in violation of the standard) or to enrich responses with additional fields. Designing custom errors in HTTP APIs will be discussed in the corresponding chapter.
NB: note the problem with the specification design. By default, all 4xx
codes are non-cacheable, but there are several exceptions, namely the 404
, 405
, 410
, and 414
codes. While we believe this was done with good intentions, the number of developers aware of this nuance is likely to be similar to the number of HTTP specification editors.
One Important Remark Regarding Caching
Caching is a crucial aspect of modern microservice architecture design. It can be tempting to control caching at the protocol level, and the HTTP standard provides various tools to facilitate this. However, the author of this book must warn you: if you decide to utilize these tools, it is essential to thoroughly understand the standard. Flaws in the implementation of certain techniques can result in disruptive behavior. The author personally experienced a major outage caused by the aforementioned lack of knowledge regarding the default cacheability of 404
responses. In this incident, some settings for an important geographical area were mistakenly deleted. Although the problem was quickly localized and the settings were restored, the service remained inoperable in the area for several hours because clients had cached the 404
response and did not request it anew until the cache had expired.
One Important Remark Regarding Consistency
One parameter might be placed in different components of an HTTP request. For example, an identifier of a partner making a request might be passed as part of:
A domain name, e.g.,
{partner_id}.domain.tld
A path, e.g.,
/v1/{partner_id}/orders
A query parameter, e.g.
/v1/orders?partner_id=<partner_id>
A header value, e.g.
GET /v1/orders HTTP/1.1
X-ApiName-Partner-Id: <partner_id>
A field within the request body, e.g.
POST /v1/orders/retrieve HTTP/1.1
{
"partner_id": <partner_id>
}
There are also more exotic options, such as placing a parameter in the scheme of a request or in the Content-Type
header.
However, when we move a parameter around different components, we face three annoying issues:
Some tokens are case-sensitive (path, query parameters, JSON field names), while others are not (domain and header names)
With header values, there is even more chaos: some of them are required to be case-insensitive (e.g.,
Content-Type
), while others are prescribed to be case-sensitive (e.g.,ETag
)
Allowed symbols and escaping rules differ as well:
Notably, there is no widespread practice for escaping the
/
,?
, and#
symbols in a pathUnicode symbols in domain names are allowed (though not universally supported) through a peculiar encoding technique called “Punycode”
Traditionally, different casings are used in different parts of an HTTP request:
kebab-case
in domains, headers, and pathssnake_case
in query parameterssnake_case
orcamelCase
in request bodies.
Furthermore, using both
snake_case
andcamelCase
in domain names is impossible as the underscore sign is not allowed and capital letters will be lowercased during URL normalization.
Theoretically, it is possible to use kebab-case
everywhere. However, most programming languages do not allow variable names and object fields in kebab-case
, so working with such an API would be quite inconvenient.
To wrap this up, the situation with casing is so spoiled and convoluted that there is no consistent solution to employ. In this book, we follow this rule: tokens are cased according to the common practice for the corresponding request component. If a token's position changes, the casing is changed as well. (However, we're far from recommending following this approach unconditionally. Our recommendation is rather to try to avoid increasing the entropy by choosing a solution that minimizes the probability of misunderstanding.)
NB: strictly speaking, JSON stands for “JavaScript Object Notation,” and in JavaScript, the default casing is camelCase
. However, we dare to say that JSON ceased to be a format bound to JavaScript long ago and is now a universal format for organizing communication between agents written in different programming languages. Employing camel_case
allows for easily moving a parameter from a query to a body, which is the most frequent case. Although the inverse solution (i.e., using camelCase
in query parameter names) is also possible.
This is Chapters 33-35 of “The API” book being written by Sergey Konstantinov. I also have a book on the history of beer and historical beer styles, a Telegram channel on interesting classical music recordings, a travel photo blog on Unsplash, and a website with ranking fantasy & science fiction novels based on awards they received.