When building web services, a common wisdom is to try to reduce the number of
HTTP requests to improve performance.

There are a variety of benefits to this, including less total bytes being
sent, but the predominant reason is that traditionally browsers will only make
6 HTTP requests in parallel for a single domain. Before 2008, most browsers
limited this to 2.

When this limit is reached, it means that browsers will have to wait until
earlier requests are finished before starting new ones. One implication is that
the higher the latency is, the longer it will take until all requests finish.

Take a look at an example of this behavior. In the following simulation we’re
fetching a ‘main’ document. This could be the index of a website, or a some
JSON collection.

After getting the main document, the simulator grabs 99 linked items. These
could be images, scripts, or other documents from an API.

The 6 connection limit has resulted in a variety of optimization techniques.
Scripts are combined and compressed, graphics are often combined into
‘sprite maps’.

The limit and ‘cost’ of a single HTTP connection also has had an effect on web
services. Instead of creating small, specific API calls, designers of REST
(and other HTTP-) services are incentivized to pack many logical ‘entities’
in a single HTTP request/response.

For example, when an API client needs a list of ‘articles’ from an API, usually
they will get this list from a single endpoint instead of fetching each article
by its own URI.

The savings are massive. The following simulation is similar to the last,
except now we’ve combined all entities in a single request.

If an API client needs a specific (large) set
of entities from a server, in order to reduce HTTP requests, API developers will
be compelled to either build more API endpoints, each to give a result
that is tailored to the specific use-case of the client or deploy systems
that can take arbitrary queries and return all the matching entities.

The simplest form of this is perhaps a collection with many query parameters,
and a much more complex version of this is GraphQL, which effectively uses
HTTP as a pipe for its own request/response mechanism and allows for a wide range
of arbitrary queries.

Drawbacks of compounding documents

There’s a number of drawbacks to this. Systems that require compounding of
entities typically need additional complexity on both server and client.

Instead of treating a single entity as some object that has a URI, which
can be fetched with GET and subsequently cached, a new layer is required
on both server and client-side that’s responsible for teasing these entities

Re-implementing the logic HTTP already provides also has a nasty side-effect
that other features from HTTP must also be reimplemented. The most common
example is caching.

On the REST-side of things, examples of compound-documents can be seen in
virtually any standard. JSON:API, HAL and Atom all have
this notion.

If you look at most full-featured JSON:API client implementations, you will
usually see that these clients often ship with some kind of ‘entity store’,
allowing it to keep track of which entities it received, effectively
maintaining an equivalent of a HTTP cache.

Another issue is that for some of these systems, is that it’s typically
harder for clients to just request the data they need. Since they are often
combined in compound documents it’s all-or-nothing, or significant complexity
on client and server (see GraphQL).

A more lofty drawback is that API designers may have trended towards systems
that are more opaque, and are no longer part of the web of information due
to a lack that interconnectedness that linking affords.

HTTP/2 and HTTP/3

HTTP/2 is now widely available. In HTTP/2 the cost of HTTP requests is
significantly lower. Whereas with HTTP/1.1 it was required to open 1 TCP
connection per request, with HTTP/2 1 connection is opened per domain. Many
requests can flow through them in parallel, and potentially out of order.

Instead of delegating parallelism to compound documents, we can now actually
rely on the protocol itself to handle this.

Using many HTTP/2 requests instead of compound HTTP/1.1 requests has many

  • It’s no longer required for (browser) applications to tease out many
    entities from a single response. Everything can just be fetched with GET.
    Instead of collections embedding their items, they can just point to
  • If a browser has a cached copy of (some of) the items in a collection,
    it can intelligently skip the request or quickly get a 304 Not Modified
  • It’s possible for some items to arrive faster than others, if they were
    done earlier. Allowing interfaces to render items as they arrive, instead
    of waiting for everything to arrive at once.

HTTP/2 Push

There are still benefits that combined requests have over many responses.

Let’s use a real example. We’re building a blog api that has a list of articles.
When we request the list, instead of returning every article we’re now just
returning a list of links:

GET /articles HTTP/1.1
Host: api.example.org
HTTP/1.1 200 OK
Content-Type: application/json

  "_links": {
    "item": [
      { "href": "http://www.webdesignernews.com/articles/1" },
      { "href": "http://www.webdesignernews.com/articles/2" },
      { "href": "http://www.webdesignernews.com/articles/3" },
      { "href": "http://www.webdesignernews.com/articles/4" },
      { "href": "http://www.webdesignernews.com/articles/5" },
      { "href": "http://www.webdesignernews.com/articles/6" },
      { "href": "http://www.webdesignernews.com/articles/7" }
  "total": 7

For a client to get the full list of articles, it first needs to fetch
the collection, wait for the response and then fetch every item in parallel.
This doubles the latency.

Another issue is that the server now needs to process 8 requests. One for
the collection, and then 1 per item. It’s often much cheaper to generate the
entire list at once. This is sometimes referred to as the N 1 Query problem.

The problem might potentially be eliminated with HTTP/2 Server Push. Server
Push is a new feature in HTTP/2 that allows the server to take the initiative
to send additional responses before the client has actually requested them.

Unfortunately this method also has a drawback. The server does not know
what resources a client already has cached. It can only assume it must
send everything, or try to intelligently guess what they might need.

There was a proposal in the works to resolve this, by letting the browser
inform the server of their cache via a bloom filter. I believe this is
unfortunately now abandoned.

So you can either fully eliminate the initial latency, or you can have
a reduced amount of traffic due to caching, but not both.

The ideal might be a mixture of this. I’ve been working on a specification
for allowing HTTP clients to specify what link relationships they would
like to receive via a HTTP header. It’s called Prefer Push, and a
request looks a little bit like this:

GET /articles HTTP/2
Prefer-Push: item
Host: api.example.org

If a server supports this header, it knows that the client will want
all the linked resources with the ‘item relationship’ and start pushing
them as early as possible.

On the server-side, a fictional controller in a fictional framework might
handle this request as follows:

function articlesIndex(request, response, connection) {

  const articles = articleServer.getIndex();
  response.body = articles.toLinks();

  if (request.prefersPush('item')) {

    for(const article of articles) {


The CORS problem

A major drawback that’s worth pointing out, is CORS. CORS originally
opened the door to making it easier to do HTTP requests from a web application
that’s hosted on some domain, to an API hosted on another domain.

It does this with a few different facilities, but one that specifically kills
performance is the preflight request.

When doing ‘unsafe’ cross-domain requests, the browser will start off by doing
an OPTIONS request, allowing the server to explicitly opt-in to requests.

In practice most API requests are ‘unsafe’. The implication is that the latency
of each individual HTTP request at least doubles.

What’s interesting is that Macromedia Flash also had this issue, and they
solved with by creating a domain-wide cross-origin request policy. All you had
to do is create a crossdomain.xml file on the root of your domain, and once
Flash read the policy it would remember it.

Every few months I search to see if someone is working on a modern version of
this for Javascript, and this time I’ve found a W3C Draft Specification.
Here’s hoping browser vendors pick this up!

A less elegant workaround to this, is to host a ‘proxy script’ on the API’s
domain. Embedded via an