Introducing http_cache, a BEAM-native standard-compliant HTTP caching library
One year ago I released the http_cache
Erlang library
along with 2 Elixir libraries (plug_http_cache
and tesla_http_cache
) that make use of it.
When I started writing these libs, I thought it would take a few weeks of work to have them completed. HTTP caching is harder than I though, and it took way longer. Why, then, bothering writing them when other HTTP caching systems already exist? In this blog post, I intend to explain my motivation and show what features they support.
Prior experience with HTTP caching
At a company where I worked a few years ago, I was tasked with migrating our HTTP caching system from a CDN to an in-house system. CDN costs were skyrocketing due to increased use of our service and abuses from some users or scrapers. Our users were closed to our datacenters, so the latency increase hit was expected to be reasonable.
We discussed 2 options: write an Erlang HTTP caching proxy, or use an existing software. We went with the latter, because why reinventing the wheel after all, and chose to deploy Varnish, a well-known HTTP cache server.
For those who have already deployed Varnish and don’t want to feel the pain again, feel free to skip this section. Deploying it turned to be more cumbersome than we thought.
With Varnish, you first have to master a brand new language: VCL. This is Varnish DSL to configure which objects are cached, and for how long (among other things). To give you a taste, here’s an example of a VCL configuration file:
sub vcl_recv {
set req.http.Host = regsub(req.http.Host, ":[0-9]+", "");
unset req.http.proxy;
set req.url = std.querysort(req.url);
set req.url = regsub(req.url, "\?$", "");
set req.http.Surrogate-Capability = "key=ESI/1.0";
if (std.healthy(req.backend_hint)) {
set req.grace = 10s;
}
if (!req.http.X-Forwarded-Proto) {
if(std.port(server.ip) == 443 || std.port(server.ip) == 8443) {
set req.http.X-Forwarded-Proto = "https";
} else {
set req.http.X-Forwarded-Proto = "https";
}
}
if (req.http.Upgrade ~ "(?i)websocket") {
return (pipe);
}
if (req.method != "GET" &&
req.method != "HEAD" &&
req.method != "PUT" &&
req.method != "POST" &&
req.method != "TRACE" &&
req.method != "OPTIONS" &&
req.method != "PATCH" &&
req.method != "DELETE") {
return (pipe);
}
if (req.method != "GET" && req.method != "HEAD") {
return (pass);
}
if (req.url ~ "^[^?]*\.(7z|avi|bmp|bz2|css|csv|doc|docx|eot|flac|flv|gif|gz|ico|jpeg|jpg|js|less|mka|mkv|mov|mp3|mp4|mpeg|mpg|odt|ogg|ogm|opus|otf|pdf|png|ppt|pptx|rar|rtf|svg|svgz|swf|tar|tbz|tgz|ttf|txt|txz|wav|webm|webp|woff|woff2|xls|xlsx|xml|xz|zip)(\?.*)?$") {
unset req.http.Cookie;
return(hash);
}
}
sub vcl_hash {
hash_data(req.http.X-Forwarded-Proto);
}
sub vcl_backend_response {
if (bereq.url ~ "^[^?]*\.(7z|avi|bmp|bz2|css|csv|doc|docx|eot|flac|flv|gif|gz|ico|jpeg|jpg|js|less|mka|mkv|mov|mp3|mp4|mpeg|mpg|odt|ogg|ogm|opus|otf|pdf|png|ppt|pptx|rar|rtf|svg|svgz|swf|tar|tbz|tgz|ttf|txt|txz|wav|webm|webp|woff|woff2|xls|xlsx|xml|xz|zip)(\?.*)?$") {
unset beresp.http.Set-Cookie;
set beresp.ttl = 1d;
}
if (beresp.http.Surrogate-Control ~ "ESI/1.0") {
unset beresp.http.Surrogate-Control;
set beresp.do_esi = true;
}
set beresp.grace = 6h;
}
The language itself is pretty accessible. You might have noticed some top-level function
(vcl_recv
, vcl_hash
, vcl_backend_response
): these are states of Varnish FSM that are called at
specific steps of the request/response processing.
When writing VCL code, you must understand what theses steps do as shown in the FSM schema:
Like any language, VCL comes with its quirks. For instance, some headers can appear multiple times in an HTTP request or response. Both following examples are valid and equivalent:
Accept: text/html, application/xhtml+xml
Accept: text/html
Accept: application/xhtml+xml
However, VCL returns only the value of the first occurrence of the header. It’s possible to get all
values by using the
header
VMOD (an extension),
either by finding the library compiled for the right OS and ABI, or by compiling it from
source.
For performance reasons, we wanted to optimize requesting many resources to use the cache. When hitting:
/resources/1
and then:
/resources?id=1,2,3
we wanted to be able to:
- use the cached version of
/resources/1
- retrieve
/resources/2
and/resources/3
from the origin (backend) and cache them
Varnish supports ESI, that allows dynamically including chunks of data coming from the cache. This makes access to the cache programmable. This looked promising, but for some reasons (limitations with error handling mainly), we couldn’t use it as we expected. Long story short, we ended up deploying 2 types of Varnish instances:
- one to cache requests to unique resources
- one to cache requests to a batch of resources
We also had to deploy Nginx servers:
- for TLS termination (Varnish OSS doesn’t support it)
- to implement the optimization mentioned above, using Lua scripts
Suddenly that was a lot of services!
We also needed to support invalidation. Varnish supports it by setting a key in a header before
caching a response (xkey
) and hitting a specific URL (/purge
) to delete matching responses. It
allows for instance to delete all images of a given client and force cache to repopulate with
updated images.
It supports it, but our infrastructure didn’t. We had deployed our caching service using docker swarm and our 3 Varnish instance were “hidden” behind a unique IP address. This is the standard way to do load-balancing with docker swarm.
As a consequence, an invalidation request would only hit one instance of Varnish, and invalidates cached responses on this instance. Varnish instances do not communicate with each other, and thus do not broadcast invalidation requests. To solve this issue we had to
- change the docker swarm load-balancing configuration (from
vip
todnsrr
) - write a Lua script (again) on our nginx instance that:
- intercepts purge requests
- requests the DNS to get the address of all Varnish instances
- duplicates and forwards the invalidation request to them
Last but not least, after a few months of running all of it in production, we noticed a strange behaviour with our 40 workers downstream of varnish. Their workload were very uneven, and after investigating we noticed it was directly correlated to the number of connections opened by Varnish to these workers. Some had 2 or 3 more opened connections from Varnish than the others.
We noticed it was also correlated with the order these worker were deployed using rolling upgrades: the first deployed workers had many more connections, and the lasts way less.
We finally understood that this is due to the nature of Varnish load-balancing and persistent connection: when workers are restarted, Varnish opens new persistent connections using a simple round robin algorithm, and the first deployed workers will naturally have received more connections at the end of the process. Surprisingly, this imbalance persists a long time.
To know more about this class of issue I’d recommend the following article: Long-lived TCP connections and Load Balancers.
These are some of the challenges you may meet deploying such a piece of software. My point is not to rant about Varnish. It’s a very performant piece of software, optimized in subtle ways, and it’s OSS. We successfully deployed it after a few weeks of work and since then it serves cached content for thousands of requests per second, with high reliability. We save thousands of $ every month in CDN bills. For this, thanks for your service, Varnish 🫡
But setting up and maintaining additional software comes with a cost, and I just discovered that (it was my first job as a software engineer).
Project genesis
Some time later, I was watching The Soul of Erlang and Elixir. Remember this slide?
The BEAM being excellent at managing state, among other things, I started thinking we needed an BEAM alternative. Something like that:
http_cache
was in its infancy.
HTTP caching is standardized in RFC9111 (supersedes RFC7234). If you want to taste how complex HTTP caching is, you can read Section 3.
The main difficulty, when caching HTTP responses, is that a request can have several associated
responses. It all depends on the request headers, and this is why the
vary
header exists.
From an implementation perspective, it means that returning a cached response involves:
- retrieving the list of eligible cached response for a given request
- comparing the current request with the request that triggered caching a response in the first place, to check if headers that vary match
Other than that, there are less-known requirements such as:
- deleting the associated cached responses when a destructive method is called on a resource.
PATCH /resources/1
should invalidate cached responses for/resources/1
- when revalidating (asking the origin server if the response changed), headers of the stored
responses must be updated in case a
304 Not Modified
response is received - the same mechanism applies for
HEAD
requests
Hence the storage backend for cached response cannot be a plain key/value store.
Now let’s see how HTTP caching was implemented in this libraries.
http_cache
basics
http_cache
is a stateless Erlang library. It analyzes
request when caching with
http_cache:cache/3
, to determine if
they can be cached and for how long, and returns the freshest compatible response when using
http_cache:get/2
.
1> Req = {<<"GET">>, <<"http://example.org">>, [], <<>>}.
{<<"GET">>,<<"http://example.org">>,[],<<>>}
2> Resp = {200, [{<<"content-type">>, <<"text/plain">>}], <<"Cache me">>}.
{200,[{<<"content-type">>,<<"text/plain">>}],<<"Cache me">>}
3> Opts = #{store => http_cache_store_process, type => shared}.
#{store => http_cache_store_process, type => shared}
4> http_cache:cache(Req, Resp, Opts).
{ok,{200,
[{<<"content-type">>,<<"text/plain">>},
{<<"content-length">>,<<"8">>}],
<<"Cache me">>}}
5> http_cache:get(Req, Opts).
{fresh,{{ <<21,255,141,93,218,86,217,58,55,246,85,151,223,
133,134,248,212,121,102,151,176,244,210,11,46,
...>>,
#{}},
{200,
[{<<"content-type">>,<<"text/plain">>},
{<<"content-length">>,<<"8">>},
{<<"age">>,<<"10">>}],
<<"Cache me">>}}}
% after some time
6> http_cache:get(Req, Opts).
{must_revalidate,{{ <<21,255,141,93,218,86,217,58,55,246,
85,151,223,133,134,248,212,121,102,
151,176,244,210,11,46,...>>,
#{}},
{200,
[{<<"content-type">>,<<"text/plain">>},
{<<"content-length">>,<<"8">>},
{<<"age">>,<<"218">>}],
<<"Cache me">>}}}
% after more time
7> http_cache:get(Req, Opts).
miss
To save space when storing responses, it can automatically compress and decompress content with gzip, when using the right options:
1> Req = {<<"GET">>, <<"http://example.org">>, [{<<"accept-encoding">>, <<"gzip">>}], <<>>}.
{<<"GET">>,<<"http://example.org">>,
[{<<"accept-encoding">>,<<"gzip">>}],
<<>>}
2> Resp = {200, [{<<"content-type">>, <<"text/plain">>}], <<"Cache me">>}.
{200,[{<<"content-type">>,<<"text/plain">>}],<<"Cache me">>}
3> Opts = #{store => http_cache_store_process, auto_compress => true, auto_accept_encoding => true, compression_threshold => 0}.
#{auto_accept_encoding => true,auto_compress => true,
compression_threshold => 0,
store => http_cache_store_process}
4> http_cache:cache(Req, Resp, Opts).
{ok,{200,
[{<<"content-type">>,<<"text/plain">>},
{<<"vary">>,<<"accept-encoding">>},
{<<"content-encoding">>,<<"gzip">>},
{<<"content-length">>,<<"28">>}],
<<31,139,8,0,0,0,0,0,0,3,115,78,76,206,72,85,200,77,5,0,
162,221,237,17,...>>}}
5> http_cache:get(Req, Opts).
{fresh,{{ <<66,6,235,59,78,212,170,99,22,66,97,143,156,120,
56,105,201,159,80,222,159,201,79,88,72,...>>,
#{<<"accept-encoding">> => <<"gzip">>}},
{200,
[{<<"content-type">>,<<"text/plain">>},
{<<"vary">>,<<"accept-encoding">>},
{<<"content-encoding">>,<<"gzip">>},
{<<"content-length">>,<<"28">>},
{<<"age">>,<<"8">>}],
<<31,139,8,0,0,0,0,0,0,3,115,78,76,206,72,85,200,77,5,
0,162,221,...>>}}}
Here we’ve lowered the threshold for compressing from the default of 1000 bytes, and the compressed content is actually bigger than the uncompressed. Which is why there’s a threshold in the first place.
To avoid compressing binary responses, only a subset of MIME types are compressed
by default. Refer to the auto_compress_mime_types
option to get the complete list.
Once the response is cached, http_cache
responds to range requests:
10> http_cache:get({<<"GET">>, <<"http://example.org">>, [{<<"range">>, <<"bytes=0-4">>}], <<>>}, Opts).
{fresh,{{ <<66,6,235,59,78,212,170,99,22,66,97,143,156,120,
56,105,201,159,80,222,159,201,79,88,72,...>>,
#{}},
{206,
[{<<"content-type">>,<<"text/plain">>},
{<<"content-range">>,<<"bytes 0-4/8">>},
{<<"content-length">>,<<"5">>},
{<<"age">>,<<"60">>}],
<<"Cache">>}}}
This may help supporting <video/>
players for instance, that use range requests to skip to a
specific time of the video. Caching partial responses (206 Partial Content
) is not supported
by http_cache
, in this scenario it would come at the cost of loading the full video on the first
request.
Revalidation is supported with the
http_cache:cache/4
function. It consists
in sending a conditional request (with if-modified-since
and if-none-match
) and analyzing the
response. If 304 Not Modified
is returned by the origin server, the stored response is updated and
returned.
Responses can be tagged and later invalidated by tag (called alternate key) with
http_cache:invalidate_by_alternate_key/2
:
4> http_cache:cache(Req, Resp, Opts#{alternate_keys => [some_key]}).
{ok,{200,
[{<<"content-type">>,<<"text/plain">>},
{<<"content-length">>,<<"8">>}],
<<"Cache me">>}}
5> http_cache:get(Req, Opts).
{fresh,{{ <<66,6,235,59,78,212,170,99,22,66,97,143,156,120,
56,105,201,159,80,222,159,201,79,88,72,...>>,
#{}},
{200,
[{<<"content-type">>,<<"text/plain">>},
{<<"content-length">>,<<"8">>},
{<<"age">>,<<"4">>}],
<<"Cache me">>}}}
6> http_cache:invalidate_by_alternate_key(some_key, Opts).
{ok,1}
7> http_cache:get(Req, Opts).
miss
Resources can also simply be invalidated by URL (see
http_cache:invalidate_url/2
).
If you’re into REST APIs, you might be interested by the ignore_query_params_order
option. As its
name suggests, it rearranges query parameters so that whatever their order, it’ll find matching
cached responses.
An HTTP cache can be:
- a shared cache - cached responses are shared among all users
- a private cache - cached responses are per user, and may contain private information. In this
case, you must use the
bucket
option to specify to which user the cache response belongs
Almost all of RFC9111 is implemented, except for caching partial responses and combining them when they’re adjacent.
Associated libraries
So far, http_cache
is boring because it doesn’t do anything useful. It’s meant to be used by
other higher-level libraries. 2 of them were released along with http_cache
.
First is plug_http_cache
. A simple Plug that
automatically caches responses (when cache-control
header allows it) and is simple to configure:
defmodule SimpleAPIWeb.Router do
use SimpleAPIWeb, :router
pipeline :api do
plug :accepts, ["json"]
plug PlugHTTPCache, Application.compile_env(:simple_api, :plug_http_cache_opts)
plug PlugCacheControl, directives: [:public, s_maxage: 600]
end
scope "/api", SimpleAPIWeb do
pipe_through :api
get "/products", ProductController, :index
get "/products/:id", ProductController, :show
end
end
and that’s it! All responses will be cached. If you need finer control of which responses are cached and which aren’t, you can set cache control headers at the controller level instead of in the router.
PlugHTTPCache.StaleIfError
,
when enabled, allows returning stale (not fresh) content when something on the backend fails. Think
restarting DB, network glitches or full connection pool. One just has to set the stale-if-error
cache control header in the cached response:
cache-control: max-age=300, stale-if-error=3600
and plug_http_cache
will return the cached response for 5 minutes, and for an additional
60 minutes should the backend be unavailable (in Phoenix’s terms - if an exception is raised).
The second library is TeslaHTTPCache
. This is an
HTTP caching middleware for Tesla:
iex> client = Tesla.client([{TeslaHTTPCache, %{store: :http_cache_store_process}}])
%Tesla.Client{
fun: nil,
pre: [{TeslaHTTPCache, :call, [%{store: :http_cache_store_process}]}],
post: [],
adapter: nil
}
iex> Tesla.get!(client, "http://perdu.com")
%Tesla.Env{
method: :get,
url: "http://perdu.com",
query: [],
headers: [
{"cache-control", "max-age=600"},
{"vary", "Accept-Encoding,User-Agent"},
{"content-type", "text/html"},
{"expires", "Sat, 22 Apr 2023 14:25:11 GMT"},
{"last-modified", "Thu, 02 Jun 2016 06:01:08 GMT"},
{"content-length", "204"},
... (omitted for brevity)
],
body: "<html><head><title>Vous Etes Perdu ?</title></head><body><h1>Perdu sur l'Internet ?</h1><h2>Pas de panique, on va vous aider</h2><strong><pre> * <----- vous êtes ici</pre></strong></body></html>\n",
status: 200,
opts: [],
__module__: Tesla,
__client__: %Tesla.Client{...}
}
iex> Tesla.get!(client, "http://perdu.com")
%Tesla.Env{
method: :get,
url: "http://perdu.com",
query: [],
headers: [
{"cache-control", "max-age=600"},
{"vary", "Accept-Encoding,User-Agent"},
{"content-type", "text/html"},
{"expires", "Sat, 22 Apr 2023 14:25:11 GMT"},
{"last-modified", "Thu, 02 Jun 2016 06:01:08 GMT"},
{"content-length", "204"},
{"age", "8"}
... (omitted for brevity)
],
body: "<html><head><title>Vous Etes Perdu ?</title></head><body><h1>Perdu sur l'Internet ?</h1><h2>Pas de panique, on va vous aider</h2><strong><pre> * <----- vous êtes ici</pre></strong></body></html>\n",
status: 200,
opts: [],
__module__: Tesla,
__client__: %Tesla.Client{...}
}
Noticed the age
header in the second response? It means the response has been returned by
the cache.
Work is underway to implement the same type of middleware for the
req
library.
Any layer dealing with HTTP request/response cycle could use http_cache
. I’m thinking of:
- a Cowboy middleware
- Erlang and Elixir reverse proxies (such as
reverse_proxy_plug
)
Stores
You might have noticed the store
parameter. http_cache
doesn’t store anything, and in the
previous example we used the
http_cache_store_process
store.
As its name suggests, it caches responses in the current process. Useful for testing, not much for
production use.
http_cache
comes with 2 LRU stores:
http_cache_store_memory
: responses are stored in memory (ETS)http_cache_store_disk
: responses are stored on disk
Why not a store that stores responses in both memory and on disk? Because it’s not needed.
If you look at http_cache
types, you might notice that a returned response’s body can either
be IOdata or a pointer to a file:
-type response() :: {status(), headers(), body() | sendfile()}.
-type body() :: iodata().
-type sendfile() ::
{sendfile,
Offset :: non_neg_integer(),
Length :: non_neg_integer() | all,
Path :: binary()}.
When using http_cache_store_disk
, a sendfile()
response body is returned. It’s up to the
user to get the file content from the disk. plug_http_cache
, however, doesn’t do that
but forwards the information to the adapter, that uses the sendfile
system call: the kernel sends
the content of the file directly to the socket, and caches the content in memory. The file is never
loaded in userland. This is extremely efficient.
Both these stores support clustering. As long as they are in the same cluster and the feature is enabled, they will:
- exchange cached responses between them
- warm up joining nodes by sending them the most recently used responses
- broadcast invalidation request
Both also implement backpressure mechanisms. An HTTP cache should never block a request/response cycle, which is why calls are non-blocking and caching can be stopped when some limits are reached:
- too many responses are cached - a cleanup process is started as soon as possible to nuke least recently used entries
- too many responses are queued for caching - new ones are dropped
This is essential to make sure the system remains stable. As a consequence, one cannot make any assumption as to whether a response will be cached or not.
Both backend emit telemetry events (http_cache
as well, by the way). You can take a look at
plug_http_cache_demo
where telemetry events
are transformed into nice dashboards:
(Don’t take the numbers for real-world values - this is a testing environment with 3 Phoenix instances running in Docker and Tsung sending tons of requests - from the same old Thinkpad!)
Programmable caching subsystem
Since HTTP caching is performed within the BEAM, cached objects are directly available to users. The language to query them is, well, HTTP.
This enables optimizations that would otherwise be more complicated to implemented. For instance,
how to look up in the cache for each individual resource of a batch request (/resources?id=1,2,3
)?
Let’s try in Elixir’s Phoenix controller:
lib/simple_api_web/product_controller.ex
:
defmodule SimpleAPIWeb.ProductController do
use SimpleAPIWeb, :controller
require Logger
alias SimpleAPI.Products
@http_cache_opts Application.compile_env(:simple_api, :plug_http_cache_opts)
def index(conn, %{"ids" => ids}) do
products =
ids
|> String.split(",")
|> Task.async_stream(fn id -> get_from_cache(id) || Products.get_product!(id) end)
|> Enum.map(fn {:ok, value} -> value end)
render(conn, :index, products: products)
end
def show(conn, %{"id" => id}) do
product = Products.get_product!(id)
render(conn, :show, product: product)
end
defp get_from_cache(id) do
{"GET", url(~p"/api/products/#{id}"), [], ""}
|> :http_cache.get(@http_cache_opts)
|> process_cache_response()
end
defp process_cache_response({:fresh, {resp_ref, {_status, _headers, {:sendfile, _, _, path}}}}) do
case File.read(path) do
{:ok, body} ->
:http_cache.notify_response_used(resp_ref, @http_cache_opts)
{:from_cache, body}
{:error, _} ->
nil
end
end
defp process_cache_response({:fresh, {resp_ref, {_status, _headers, body}}}) do
:http_cache.notify_response_used(resp_ref, @http_cache_opts)
{:from_cache, body}
end
defp process_cache_response(_) do
nil
end
end
lib/simple_api_web/product_json.ex
:
defmodule SimpleAPIWeb.ProductJSON do
alias SimpleAPI.Products.Product
@doc """
Renders a list of products.
"""
def index(%{products: products}) do
for(product <- products, do: data(product))
end
@doc """
Renders a single product.
"""
def show(%{product: product}) do
data(product)
end
defp data(%Product{} = product) do
%{
id: product.id,
name: product.name
}
end
defp data({:from_cache, body}) do
Jason.Fragment.new(body)
end
end
(Note the opportune use of
Jason.Fragment.new/1
that avoids decoding
the individual JSON payload and encoding it again.)
When using a traditional DB, we’d probably not need to do that (especially retrieving products one by one). This might come handy, however, when hitting external APIs, or slower database like some NoSQL DBs.
We could do the opposite and cache individual products when processing a list of products. In practice, we’d need to encode the body (easy) and set the headers like Phoenix would have done for an individual request (harder).
Beyond HTTP, http_cache
can actually be used to cache any data, such as GraphQL for example. As
long as a method, a URI and cache-control headers are set, http_cache
won’t care.
Conclusion
http_cache
is intended to give another option to the BEAM world when it comes to HTTP caching
with a solution that:
- is native BEAM code and doesn’t require additional infrastructure dependencies
- takes advantages of BEAM clustering capabilities
- uses standard BEAM tools (telemetry…)
- conforms as much as possible to RFCs
- and remains simple
http_cache
is not thoroughly load-tested at the time of writing, due to lack of time and resources.
Hopefully, most of the work is done to provide with a viable HTTP caching layer on the BEAM, but feel
free to leave feedback if you’re using it, or if you encountered problems with these libraries. Until
then it’ll remain pre v1.0.
I’m also working on licensed versions of some of these libraries, available on the excellent codecodeship platform, but that’s a story for another day.