When building web services, a common wisdom is to try to reduce the number of
HTTP requests to improve performance.
There are a variety of benefits to this, including less total bytes being
sent, but the predominant reason is that traditionally browsers will only make
6 HTTP requests in parallel for a single domain. Before 2008, most browsers
limited this to 2.
When this limit is reached, it means that browsers will have to wait until
earlier requests are finished before starting new ones. One implication is that
the higher the latency is, the longer it will take until all requests finish.
Take a look at an example of this behavior. In the following simulation we’re
fetching a ‘main’ document. This could be the index of a website, or a some
JSON collection.
After getting the main document, the simulator grabs 99 linked items. These
could be images, scripts, or other documents from an API.
The 6 connection limit has resulted in a variety of optimization techniques.
Scripts are combined and compressed, graphics are often combined into
‘sprite maps’.
The limit and ‘cost’ of a single HTTP connection also has had an effect on web
services. Instead of creating small, specific API calls, designers of REST
(and other HTTP-) services are incentivized to pack many logical ‘entities’
in a single HTTP request/response.
For example, when an API client needs a list of ‘articles’ from an API, usually
they will get this list from a single endpoint instead of fetching each article
by its own URI.
The savings are massive. The following simulation is similar to the last,
except now we’ve combined all entities in a single request.
If an API client needs a specific (large) set
of entities from a server, in order to reduce HTTP requests, API developers will
be compelled to either build more API endpoints, each to give a result
that is tailored to the specific use-case of the client or deploy systems
that can take arbitrary queries and return all the matching entities.
The simplest form of this is perhaps a collection with many query parameters,
and a much more complex version of this is GraphQL, which effectively uses
HTTP as a pipe for its own request/response mechanism and allows for a wide range
of arbitrary queries.
Drawbacks of compounding documents
There’s a number of drawbacks to this. Systems that require compounding of
entities typically need additional complexity on both server and client.
Instead of treating a single entity as some object that has a URI, which
can be fetched with GET
and subsequently cached, a new layer is required
on both server and client-side that’s responsible for teasing these entities
apart.
Re-implementing the logic HTTP already provides also has a nasty side-effect
that other features from HTTP must also be reimplemented. The most common
example is caching.
On the REST-side of things, examples of compound-documents can be seen in
virtually any standard. JSON:API, HAL and Atom all have
this notion.
If you look at most full-featured JSON:API client implementations, you will
usually see that these clients often ship with some kind of ‘entity store’,
allowing it to keep track of which entities it received, effectively
maintaining an equivalent of a HTTP cache.
Another issue is that for some of these systems, is that it’s typically
harder for clients to just request the data they need. Since they are often
combined in compound documents it’s all-or-nothing, or significant complexity
on client and server (see GraphQL).
A more lofty drawback is that API designers may have trended towards systems
that are more opaque, and are no longer part of the web of information due
to a lack that interconnectedness that linking affords.
HTTP/2 and HTTP/3
HTTP/2 is now widely available. In HTTP/2 the cost of HTTP requests is
significantly lower. Whereas with HTTP/1.1 it was required to open 1 TCP
connection per request, with HTTP/2 1 connection is opened per domain. Many
requests can flow through them in parallel, and potentially out of order.
Instead of delegating parallelism to compound documents, we can now actually
rely on the protocol itself to handle this.
Using many HTTP/2 requests instead of compound HTTP/1.1 requests has many
advantages:
- It’s no longer required for (browser) applications to tease out many
entities from a single response. Everything can just be fetched withGET
.
Instead of collections embedding their items, they can just point to
them. - If a browser has a cached copy of (some of) the items in a collection,
it can intelligently skip the request or quickly get a304 Not Modified
back. - It’s possible for some items to arrive faster than others, if they were
done earlier. Allowing interfaces to render items as they arrive, instead
of waiting for everything to arrive at once.
HTTP/2 Push
There are still benefits that combined requests have over many responses.
Let’s use a real example. We’re building a blog api that has a list of articles.
When we request the list, instead of returning every article we’re now just
returning a list of links:
GET /articles HTTP/1.1
Host: api.example.org
HTTP/1.1 200 OK
Content-Type: application/json
{
"_links": {
"item": [
{ "href": "http://www.webdesignernews.com/articles/1" },
{ "href": "http://www.webdesignernews.com/articles/2" },
{ "href": "http://www.webdesignernews.com/articles/3" },
{ "href": "http://www.webdesignernews.com/articles/4" },
{ "href": "http://www.webdesignernews.com/articles/5" },
{ "href": "http://www.webdesignernews.com/articles/6" },
{ "href": "http://www.webdesignernews.com/articles/7" }
]
},
"total": 7
}
For a client to get the full list of articles, it first needs to fetch
the collection, wait for the response and then fetch every item in parallel.
This doubles the latency.
Another issue is that the server now needs to process 8 requests. One for
the collection, and then 1 per item. It’s often much cheaper to generate the
entire list at once. This is sometimes referred to as the N 1 Query problem.
The problem might potentially be eliminated with HTTP/2 Server Push. Server
Push is a new feature in HTTP/2 that allows the server to take the initiative
to send additional responses before the client has actually requested them.
Unfortunately this method also has a drawback. The server does not know
what resources a client already has cached. It can only assume it must
send everything, or try to intelligently guess what they might need.
There was a proposal in the works to resolve this, by letting the browser
inform the server of their cache via a bloom filter. I believe this is
unfortunately now abandoned.
So you can either fully eliminate the initial latency, or you can have
a reduced amount of traffic due to caching, but not both.
The ideal might be a mixture of this. I’ve been working on a specification
for allowing HTTP clients to specify what link relationships they would
like to receive via a HTTP header. It’s called Prefer Push, and a
request looks a little bit like this:
GET /articles HTTP/2
Prefer-Push: item
Host: api.example.org
If a server supports this header, it knows that the client will want
all the linked resources with the ‘item relationship’ and start pushing
them as early as possible.
On the server-side, a fictional controller in a fictional framework might
handle this request as follows:
function articlesIndex(request, response, connection) {
const articles = articleServer.getIndex();
response.body = articles.toLinks();
if (request.prefersPush('item')) {
for(const article of articles) {
connection.push(
article.url,
article.toJson();
};
}
}
The CORS problem
A major drawback that’s worth pointing out, is CORS. CORS originally
opened the door to making it easier to do HTTP requests from a web application
that’s hosted on some domain, to an API hosted on another domain.
It does this with a few different facilities, but one that specifically kills
performance is the preflight request.
When doing ‘unsafe’ cross-domain requests, the browser will start off by doing
an OPTIONS
request, allowing the server to explicitly opt-in to requests.
In practice most API requests are ‘unsafe’. The implication is that the latency
of each individual HTTP request at least doubles.
What’s interesting is that Macromedia Flash also had this issue, and they
solved with by creating a domain-wide cross-origin request policy. All you had
to do is create a crossdomain.xml
file on the root of your domain, and once
Flash read the policy it would remember it.
Every few months I search to see if someone is working on a modern version of
this for Javascript, and this time I’ve found a W3C Draft Specification.
Here’s hoping browser vendors pick this up!
A less elegant workaround to this, is to host a ‘proxy script’ on the API’s
domain. Embedded via an , it has unrestricted access to its own
‘origin’, and the te parent web application can communicate to it via
window.postMessage()
.
The perfect world
In a perfect world, HTTP/3 is already widely available, improving performance
even further, browsers have a standard mechanism to send cache digests,
clients inform the server of the link-relationships they want, allowing API
servers to push any resources clients may need, as early as possible, and
domain-wide origin policies are a thing.
This last simulation show an example of how that might look like. In the below
example the browser has a warmed up cache, and an ETag for every item.
When doing a request to find out if the collection has new entries or updated
items, the client includes a cache digest and the server responds by pushing
just the resources that have changed.
Real-world performance testing
We’re lacking some of these ‘perfect world’ features, but we can still work
with what we got. We have access to HTTP/2 Server Push, and requests are cheap.
Since HTTP/2, ‘many, small HTTP endpoints’ felt to me like it was the most
elegant design, but does the performance hold up? Some evidence could really
help.
My goal for this performance test is fetch a collection of items in the
following different ways:
h1
– Individual HTTP/1.1 requestsh1-compound
– A HTTP/1.1 compound collection.h2
– Individual HTTP/2 requestsh2-compound
– HTTP/2 compound collection.h2-cache
– A HTTP/2 collection every item individually fetched. Warm
cache.h2-cache-stale
– A HTTP/2 collection every item individually fetched,
Warm cache but needs revalidation.h2-push
– HTTP/2, no cache, but every item is pushed.
My prediction
In theory, the same amount of information is sent and work is done for a
compound request vs. HTTP/2 pushed responses.
However, I think there’s still enough overhead to HTTP requests in HTTP/2
that doing compound requests probably still have a leg up.
The real benefit will show when caching comes in to play. For a given
collection in a typical API I think it’s fair to assume that many items may
be cached.
It seems logical to assume that the tests that skip 90% of the work are
also the fastest.
So from fastest to slowest, this is my prediction.
h2-cache
– A HTTP/2 collection every item individually fetched. Warm
cache.h2-cache-stale
– A HTTP/2 collection every item individually fetched,
Warm cache but needs revalidation.h2-compound
– HTTP/2 compound collection.h1-compound
– A HTTP/1.1 compound collection.h2-push
– HTTP/2, no cache, but every item is pushed.h2
– Individual HTTP/2 requestsh1
– Individual HTTP/1.1 requests
First test setup and initial observations
I initially started testing with a local Node.js service, version 12.
All HTTP/1.1 tests are done over SSL, and HTTP/2 tests run over a different
port.
To simulate latency, I added a delay to every HTTP request between 40 and 80
milliseconds.
Here’s how my first testing tool looks like:
I ran into a number of issues right away. Chrome disables the cache with self-
signed certificates. I was not really able to figure out how to get Chrome to
accept my self-signed certificate on localhost, so I initially gave up on this
and tested with Firefox.
On Firefox, Server Push seems unreliable. It often only worked the second time I
ran the Push test.
But, the most surprising thing was that in Firefox, serving items from the
local cache was only marginally faster than serving fresh responses with my
artificial latency. Running these tests several times, in many cases serving
items from cache was actually slower than going to the server and requesting
a new copy.
Given these results, I had to improve my test setup.
Better tests
This is the basic set-up for the second test:
- I’m repeating tests 50 times.
- I’m running the server AWS,
t2.medium
instance inus-west-2
. - My testing is done over residential internet. The fake latency has been
removed. - I’m using LetsEncrypt SSL certificates.
- For each browser I’m running the test twice:
- Once with a collection containing 25 items
- A second collection containing 500 items.
Test 1: 25 requests
A few things are interesting in this graph.
First, we expected HTTP/1.1 separate requests to be the slowest, so no
surprise there. It’s really there to provide a baseline.
The second slowest is individual HTTP/2 requests.
This only gets marginally improved by HTTP/2 push or caching.
Chrome and Firefox mostly have the same results. Let’s zoom in on the Chrome
results:
Test | Median Time | % |
---|---|---|
h1,no cache | 0.490 | 100% |
h1,compound | 0.147 | 30% |
h1,90% cached | 0.213 | 43% |
h2,no cache | 0.276 | 56% |
h2,compound | 0.147 | 30% |
h2,90% cached | 0.221 | 45% |
h2,90% not modified | 0.243 | 49% |
h2,push | 0.215 | 44% |
Compound requests are by far the fastest. This indicates that my original
guess was wrong. Even when caching comes into play, it still can’t beat
just re-sending the entire collection again in a single compounded response.
Caching does marginally improve on not caching.
Test 2: 500 requests
So let’s do the big test. In this test we expect the differences to increase
in some areas due to more requests simply taking longer, and we’re expecting
some differences to decrease. In particular, the effect of the ‘initial’
request should be deemphasized.
These graphs suggest that:
- Chrome is the slowest browser for the tests that have the most requests.
- Firefox is the slowest for the tests that use Push, and the test that’s
mostly served from the browser cache.
This kinda matched my own observations. Push on firefox seemed a little
unreliable, and using the cache seemed slow.
Test | Chrome % | Firefox % |
---|---|---|
h1,no cache | 100.0% | 84.51% |
h1,compound | 5.57% | 5.61% |
h1,90% cached | 14.60% | 11.30% |
h2,no cache | 18.20% | 10.55% |
h2,compound | 5.91% | 5.73% |
h2,90% cached | 7.86% | 8.52% |
h2,90% not modified | 12.78% | 11.22% |
h2,push | 9.02% | 10.68% |
What we can tell here is at 500 requests, doing compound requests is around
1.8x faster on Firefox, and 3.26x faster on
chrome.
The biggest surprise is the speed of browser caches. Our ‘normal’ test will do
501 HTTP requests. The tests that warm the cache only do 51 requests.
These results show that doing 501 requests takes around 2.3x as long as doing
51 requests with Chrome. In Firefox this only 1.2x.
In other words, the total time needed for Firefox to request something from
its cache is only marginally faster from getting that resource from the other
side of the continent. I was very surprised by this.
This made me wonder if Firefox’s cache is just slow in general, or especially
bad at high concurrent access. I have no evidence for this, but it felt that
maybe Firefox’s cache has some bad global locking behavior.
Another thing that stands out here is that Chrome appears to perform especially
bad when firing off all 500 requests in parallel. More than twice as slow as
Firefox. The massive difference made me doubt my results, but I re-ran the
tests later and I get similar outcomes every time.
We also see that the benefit of using Push becomes less pronounced, as we only
really save time by reducing the latency of the first request.
Conclusions
My tests are imperfect. The tests as much test HTTP/2 in general as they test
the HTTP/2 implementation in Node. To get real proof, I think it’s important
to test with more situations.
My server-implementation might also not have been the best one. My service
served files from the filesystem, but a system under real load might behave
differently.
So treat these results as evidence, but not proof.
The evidence tells me that if speed is the most important requirement, you
should continue to use compound responses.
I do believe though that the results are close enough that it might be worth
taking a performance hit and gain a potentially simpler system design.
It also appears that caching does not really make a significant difference.
Potentially due to poor browser optimization, often doing a fresh HTTP request
can take just as long as serving it from cache. This is especially true with
Firefox.
I also believe that the effect of Push was visible but not massive. The biggest
benefits of Push are on the first load from a new collection, and will also
become more important to avoid the N 1 Query problem. Pushing responses earlier
is mostly useful if:
- The server can really benefit from generating the responses all at once.
- If there are multiple hops needed in the API to get all its data, an
intelligent push mechanism can be very effective reducing the compound
latency.
Short summary:
- If speed is the overriding requirement, keep using compound documents.
- If a simpler, elegant API is the most important, having smaller-scoped,
many endpoints is definitely viable. - Caching only makes a bit of difference.
- Optimizations benefit the server more than the client.
However, I still doubt some of these results. If I had more time, I would try
to test this with a server written in Go, and more realistically simulate
conditions of the server-side.
My wishlist for 2020
I’m ending this post with a wish list for 2020 and beyond:
- Browser support for a domain-wide cross-domain policy.
- HTTP/3 available for use on many server-side systems.
- Cache Digest Bloom Filters in browsers that use
ETag
. - Prefer-Push adoption by REST api engineers.
- Better parallel request performance in Chrome.
- Less buggy HTTP/2 Push and better cache performance in in Firefox.
It’s an ambitious wish list. When we do arrive at this point, I believe
it will be a major step forward towards making our REST clients simpler gain,
our server implementations making fewer trade-offs for performance vs.
simplicity and treating our browsers and servers as a reliable, fast engine for
synchronizing URI-indexed resource state.