Introducing http_cache, a BEAM-native standard-compliant HTTP caching library

Posted on Oct 4, 2023

One year ago I released the http_cache Erlang library along with 2 Elixir libraries (plug_http_cache and tesla_http_cache) that make use of it.

When I started writing these libs, I thought it would take a few weeks of work to have them completed. HTTP caching is harder than I though, and it took way longer. Why, then, bothering writing them when other HTTP caching systems already exist? In this blog post, I intend to explain my motivation and show what features they support.

Prior experience with HTTP caching

At a company where I worked a few years ago, I was tasked with migrating our HTTP caching system from a CDN to an in-house system. CDN costs were skyrocketing due to increased use of our service and abuses from some users or scrapers. Our users were closed to our datacenters, so the latency increase hit was expected to be reasonable.

We discussed 2 options: write an Erlang HTTP caching proxy, or use an existing software. We went with the latter, because why reinventing the wheel after all, and chose to deploy Varnish, a well-known HTTP cache server.

For those who have already deployed Varnish and don’t want to feel the pain again, feel free to skip this section. Deploying it turned to be more cumbersome than we thought.

With Varnish, you first have to master a brand new language: VCL. This is Varnish DSL to configure which objects are cached, and for how long (among other things). To give you a taste, here’s an example of a VCL configuration file:

sub vcl_recv {
    set req.http.Host = regsub(req.http.Host, ":[0-9]+", "");
    unset req.http.proxy;
    set req.url = std.querysort(req.url);
    set req.url = regsub(req.url, "\?$", "");
    set req.http.Surrogate-Capability = "key=ESI/1.0";

    if (std.healthy(req.backend_hint)) {
        set req.grace = 10s;
    }

    if (!req.http.X-Forwarded-Proto) {
        if(std.port(server.ip) == 443 || std.port(server.ip) == 8443) {
            set req.http.X-Forwarded-Proto = "https";
        } else {
            set req.http.X-Forwarded-Proto = "https";
        }
    }

    if (req.http.Upgrade ~ "(?i)websocket") {
        return (pipe);
    }

    if (req.method != "GET" &&
        req.method != "HEAD" &&
        req.method != "PUT" &&
        req.method != "POST" &&
        req.method != "TRACE" &&
        req.method != "OPTIONS" &&
        req.method != "PATCH" &&
        req.method != "DELETE") {
        return (pipe);
    }

    if (req.method != "GET" && req.method != "HEAD") {
        return (pass);
    }

    if (req.url ~ "^[^?]*\.(7z|avi|bmp|bz2|css|csv|doc|docx|eot|flac|flv|gif|gz|ico|jpeg|jpg|js|less|mka|mkv|mov|mp3|mp4|mpeg|mpg|odt|ogg|ogm|opus|otf|pdf|png|ppt|pptx|rar|rtf|svg|svgz|swf|tar|tbz|tgz|ttf|txt|txz|wav|webm|webp|woff|woff2|xls|xlsx|xml|xz|zip)(\?.*)?$") {
        unset req.http.Cookie;
        return(hash);
    }
}

sub vcl_hash {
    hash_data(req.http.X-Forwarded-Proto);
}

sub vcl_backend_response {
    if (bereq.url ~ "^[^?]*\.(7z|avi|bmp|bz2|css|csv|doc|docx|eot|flac|flv|gif|gz|ico|jpeg|jpg|js|less|mka|mkv|mov|mp3|mp4|mpeg|mpg|odt|ogg|ogm|opus|otf|pdf|png|ppt|pptx|rar|rtf|svg|svgz|swf|tar|tbz|tgz|ttf|txt|txz|wav|webm|webp|woff|woff2|xls|xlsx|xml|xz|zip)(\?.*)?$") {
        unset beresp.http.Set-Cookie;
        set beresp.ttl = 1d;
    }

    if (beresp.http.Surrogate-Control ~ "ESI/1.0") {
        unset beresp.http.Surrogate-Control;
        set beresp.do_esi = true;
    }

    set beresp.grace = 6h;
}

The language itself is pretty accessible. You might have noticed some top-level function (vcl_recv, vcl_hash, vcl_backend_response): these are states of Varnish FSM that are called at specific steps of the request/response processing. When writing VCL code, you must understand what theses steps do as shown in the FSM schema:

VCL FSM

Like any language, VCL comes with its quirks. For instance, some headers can appear multiple times in an HTTP request or response. Both following examples are valid and equivalent:

Accept: text/html, application/xhtml+xml
Accept: text/html
Accept: application/xhtml+xml

However, VCL returns only the value of the first occurrence of the header. It’s possible to get all values by using the header VMOD (an extension), either by finding the library compiled for the right OS and ABI, or by compiling it from source.

For performance reasons, we wanted to optimize requesting many resources to use the cache. When hitting:

/resources/1

and then:

/resources?id=1,2,3

we wanted to be able to:

  • use the cached version of /resources/1
  • retrieve /resources/2 and /resources/3 from the origin (backend) and cache them

Varnish supports ESI, that allows dynamically including chunks of data coming from the cache. This makes access to the cache programmable. This looked promising, but for some reasons (limitations with error handling mainly), we couldn’t use it as we expected. Long story short, we ended up deploying 2 types of Varnish instances:

  • one to cache requests to unique resources
  • one to cache requests to a batch of resources

We also had to deploy Nginx servers:

  • for TLS termination (Varnish OSS doesn’t support it)
  • to implement the optimization mentioned above, using Lua scripts

Suddenly that was a lot of services!

We also needed to support invalidation. Varnish supports it by setting a key in a header before caching a response (xkey) and hitting a specific URL (/purge) to delete matching responses. It allows for instance to delete all images of a given client and force cache to repopulate with updated images.

It supports it, but our infrastructure didn’t. We had deployed our caching service using docker swarm and our 3 Varnish instance were “hidden” behind a unique IP address. This is the standard way to do load-balancing with docker swarm.

As a consequence, an invalidation request would only hit one instance of Varnish, and invalidates cached responses on this instance. Varnish instances do not communicate with each other, and thus do not broadcast invalidation requests. To solve this issue we had to

  • change the docker swarm load-balancing configuration (from vip to dnsrr)
  • write a Lua script (again) on our nginx instance that:
    • intercepts purge requests
    • requests the DNS to get the address of all Varnish instances
    • duplicates and forwards the invalidation request to them

Last but not least, after a few months of running all of it in production, we noticed a strange behaviour with our 40 workers downstream of varnish. Their workload were very uneven, and after investigating we noticed it was directly correlated to the number of connections opened by Varnish to these workers. Some had 2 or 3 more opened connections from Varnish than the others.

We noticed it was also correlated with the order these worker were deployed using rolling upgrades: the first deployed workers had many more connections, and the lasts way less.

We finally understood that this is due to the nature of Varnish load-balancing and persistent connection: when workers are restarted, Varnish opens new persistent connections using a simple round robin algorithm, and the first deployed workers will naturally have received more connections at the end of the process. Surprisingly, this imbalance persists a long time.

Connection imbalance with load-balancing and persistent connections

To know more about this class of issue I’d recommend the following article: Long-lived TCP connections and Load Balancers.

These are some of the challenges you may meet deploying such a piece of software. My point is not to rant about Varnish. It’s a very performant piece of software, optimized in subtle ways, and it’s OSS. We successfully deployed it after a few weeks of work and since then it serves cached content for thousands of requests per second, with high reliability. We save thousands of $ every month in CDN bills. For this, thanks for your service, Varnish 🫡

But setting up and maintaining additional software comes with a cost, and I just discovered that (it was my first job as a software engineer).

Project genesis

Some time later, I was watching The Soul of Erlang and Elixir. Remember this slide?

Soul of Erlang slide

The BEAM being excellent at managing state, among other things, I started thinking we needed an BEAM alternative. Something like that:

Soul of Erlang slide - with HTTP caching

http_cache was in its infancy.

HTTP caching is standardized in RFC9111 (supersedes RFC7234). If you want to taste how complex HTTP caching is, you can read Section 3.

The main difficulty, when caching HTTP responses, is that a request can have several associated responses. It all depends on the request headers, and this is why the vary header exists.

From an implementation perspective, it means that returning a cached response involves:

  • retrieving the list of eligible cached response for a given request
  • comparing the current request with the request that triggered caching a response in the first place, to check if headers that vary match

Other than that, there are less-known requirements such as:

  • deleting the associated cached responses when a destructive method is called on a resource. PATCH /resources/1 should invalidate cached responses for /resources/1
  • when revalidating (asking the origin server if the response changed), headers of the stored responses must be updated in case a 304 Not Modified response is received
  • the same mechanism applies for HEAD requests

Hence the storage backend for cached response cannot be a plain key/value store.

Now let’s see how HTTP caching was implemented in this libraries.

http_cache basics

http_cache is a stateless Erlang library. It analyzes request when caching with http_cache:cache/3, to determine if they can be cached and for how long, and returns the freshest compatible response when using http_cache:get/2.

1> Req = {<<"GET">>, <<"http://example.org">>, [], <<>>}.
{<<"GET">>,<<"http://example.org">>,[],<<>>}

2> Resp = {200, [{<<"content-type">>, <<"text/plain">>}], <<"Cache me">>}.
{200,[{<<"content-type">>,<<"text/plain">>}],<<"Cache me">>}

3> Opts = #{store => http_cache_store_process, type => shared}.
#{store => http_cache_store_process, type => shared}

4> http_cache:cache(Req, Resp, Opts).
{ok,{200,
     [{<<"content-type">>,<<"text/plain">>},
      {<<"content-length">>,<<"8">>}],
     <<"Cache me">>}}

5> http_cache:get(Req, Opts).
{fresh,{{ <<21,255,141,93,218,86,217,58,55,246,85,151,223,
           133,134,248,212,121,102,151,176,244,210,11,46,
           ...>>,
         #{}},
        {200,
         [{<<"content-type">>,<<"text/plain">>},
          {<<"content-length">>,<<"8">>},
          {<<"age">>,<<"10">>}],
         <<"Cache me">>}}}

% after some time
6> http_cache:get(Req, Opts).
{must_revalidate,{{ <<21,255,141,93,218,86,217,58,55,246,
                     85,151,223,133,134,248,212,121,102,
                     151,176,244,210,11,46,...>>,
                   #{}},
                  {200,
                   [{<<"content-type">>,<<"text/plain">>},
                    {<<"content-length">>,<<"8">>},
                    {<<"age">>,<<"218">>}],
                   <<"Cache me">>}}}

% after more time
7> http_cache:get(Req, Opts).
miss

To save space when storing responses, it can automatically compress and decompress content with gzip, when using the right options:

1> Req = {<<"GET">>, <<"http://example.org">>, [{<<"accept-encoding">>, <<"gzip">>}], <<>>}.
{<<"GET">>,<<"http://example.org">>,
 [{<<"accept-encoding">>,<<"gzip">>}],
 <<>>}

2> Resp = {200, [{<<"content-type">>, <<"text/plain">>}], <<"Cache me">>}.
{200,[{<<"content-type">>,<<"text/plain">>}],<<"Cache me">>}

3> Opts = #{store => http_cache_store_process, auto_compress => true, auto_accept_encoding => true, compression_threshold => 0}.
#{auto_accept_encoding => true,auto_compress => true,
  compression_threshold => 0,
  store => http_cache_store_process}

4> http_cache:cache(Req, Resp, Opts).
{ok,{200,
     [{<<"content-type">>,<<"text/plain">>},
      {<<"vary">>,<<"accept-encoding">>},
      {<<"content-encoding">>,<<"gzip">>},
      {<<"content-length">>,<<"28">>}],
     <<31,139,8,0,0,0,0,0,0,3,115,78,76,206,72,85,200,77,5,0,
       162,221,237,17,...>>}}

5> http_cache:get(Req, Opts).
{fresh,{{ <<66,6,235,59,78,212,170,99,22,66,97,143,156,120,
           56,105,201,159,80,222,159,201,79,88,72,...>>,
         #{<<"accept-encoding">> => <<"gzip">>}},
        {200,
         [{<<"content-type">>,<<"text/plain">>},
          {<<"vary">>,<<"accept-encoding">>},
          {<<"content-encoding">>,<<"gzip">>},
          {<<"content-length">>,<<"28">>},
          {<<"age">>,<<"8">>}],
         <<31,139,8,0,0,0,0,0,0,3,115,78,76,206,72,85,200,77,5,
           0,162,221,...>>}}}

Here we’ve lowered the threshold for compressing from the default of 1000 bytes, and the compressed content is actually bigger than the uncompressed. Which is why there’s a threshold in the first place.

To avoid compressing binary responses, only a subset of MIME types are compressed by default. Refer to the auto_compress_mime_types option to get the complete list.

Once the response is cached, http_cache responds to range requests:

10> http_cache:get({<<"GET">>, <<"http://example.org">>, [{<<"range">>, <<"bytes=0-4">>}], <<>>}, Opts).
{fresh,{{ <<66,6,235,59,78,212,170,99,22,66,97,143,156,120,
           56,105,201,159,80,222,159,201,79,88,72,...>>,
         #{}},
        {206,
         [{<<"content-type">>,<<"text/plain">>},
          {<<"content-range">>,<<"bytes 0-4/8">>},
          {<<"content-length">>,<<"5">>},
          {<<"age">>,<<"60">>}],
         <<"Cache">>}}}

This may help supporting <video/> players for instance, that use range requests to skip to a specific time of the video. Caching partial responses (206 Partial Content) is not supported by http_cache, in this scenario it would come at the cost of loading the full video on the first request.

Revalidation is supported with the http_cache:cache/4 function. It consists in sending a conditional request (with if-modified-since and if-none-match) and analyzing the response. If 304 Not Modified is returned by the origin server, the stored response is updated and returned.

Responses can be tagged and later invalidated by tag (called alternate key) with http_cache:invalidate_by_alternate_key/2:

4> http_cache:cache(Req, Resp, Opts#{alternate_keys => [some_key]}).
{ok,{200,
     [{<<"content-type">>,<<"text/plain">>},
      {<<"content-length">>,<<"8">>}],
     <<"Cache me">>}}
5> http_cache:get(Req, Opts).
{fresh,{{ <<66,6,235,59,78,212,170,99,22,66,97,143,156,120,
           56,105,201,159,80,222,159,201,79,88,72,...>>,
         #{}},
        {200,
         [{<<"content-type">>,<<"text/plain">>},
          {<<"content-length">>,<<"8">>},
          {<<"age">>,<<"4">>}],
         <<"Cache me">>}}}
6> http_cache:invalidate_by_alternate_key(some_key, Opts).
{ok,1}
7> http_cache:get(Req, Opts).
miss

Resources can also simply be invalidated by URL (see http_cache:invalidate_url/2).

If you’re into REST APIs, you might be interested by the ignore_query_params_order option. As its name suggests, it rearranges query parameters so that whatever their order, it’ll find matching cached responses.

An HTTP cache can be:

  • a shared cache - cached responses are shared among all users
  • a private cache - cached responses are per user, and may contain private information. In this case, you must use the bucket option to specify to which user the cache response belongs

Almost all of RFC9111 is implemented, except for caching partial responses and combining them when they’re adjacent.

Associated libraries

So far, http_cache is boring because it doesn’t do anything useful. It’s meant to be used by other higher-level libraries. 2 of them were released along with http_cache.

First is plug_http_cache. A simple Plug that automatically caches responses (when cache-control header allows it) and is simple to configure:

defmodule SimpleAPIWeb.Router do
  use SimpleAPIWeb, :router

  pipeline :api do
    plug :accepts, ["json"]
    plug PlugHTTPCache, Application.compile_env(:simple_api, :plug_http_cache_opts)
    plug PlugCacheControl, directives: [:public, s_maxage: 600]
  end

  scope "/api", SimpleAPIWeb do
    pipe_through :api

    get "/products", ProductController, :index
    get "/products/:id", ProductController, :show
  end
end

and that’s it! All responses will be cached. If you need finer control of which responses are cached and which aren’t, you can set cache control headers at the controller level instead of in the router.

PlugHTTPCache.StaleIfError, when enabled, allows returning stale (not fresh) content when something on the backend fails. Think restarting DB, network glitches or full connection pool. One just has to set the stale-if-error cache control header in the cached response:

cache-control: max-age=300, stale-if-error=3600

and plug_http_cache will return the cached response for 5 minutes, and for an additional 60 minutes should the backend be unavailable (in Phoenix’s terms - if an exception is raised).

The second library is TeslaHTTPCache. This is an HTTP caching middleware for Tesla:

iex> client = Tesla.client([{TeslaHTTPCache, %{store: :http_cache_store_process}}])
%Tesla.Client{
  fun: nil,
  pre: [{TeslaHTTPCache, :call, [%{store: :http_cache_store_process}]}],
  post: [],
  adapter: nil
}
iex> Tesla.get!(client, "http://perdu.com")
%Tesla.Env{
  method: :get,
  url: "http://perdu.com",
  query: [],
  headers: [
    {"cache-control", "max-age=600"},
    {"vary", "Accept-Encoding,User-Agent"},
    {"content-type", "text/html"},
    {"expires", "Sat, 22 Apr 2023 14:25:11 GMT"},
    {"last-modified", "Thu, 02 Jun 2016 06:01:08 GMT"},
    {"content-length", "204"},
    ... (omitted for brevity)
  ],
  body: "<html><head><title>Vous Etes Perdu ?</title></head><body><h1>Perdu sur l'Internet ?</h1><h2>Pas de panique, on va vous aider</h2><strong><pre>    * <----- vous &ecirc;tes ici</pre></strong></body></html>\n",
  status: 200,
  opts: [],
  __module__: Tesla,
  __client__: %Tesla.Client{...}
}

iex> Tesla.get!(client, "http://perdu.com")
%Tesla.Env{
  method: :get,
  url: "http://perdu.com",
  query: [],
  headers: [
    {"cache-control", "max-age=600"},
    {"vary", "Accept-Encoding,User-Agent"},
    {"content-type", "text/html"},
    {"expires", "Sat, 22 Apr 2023 14:25:11 GMT"},
    {"last-modified", "Thu, 02 Jun 2016 06:01:08 GMT"},
    {"content-length", "204"},
    {"age", "8"}
    ... (omitted for brevity)
  ],
  body: "<html><head><title>Vous Etes Perdu ?</title></head><body><h1>Perdu sur l'Internet ?</h1><h2>Pas de panique, on va vous aider</h2><strong><pre>    * <----- vous &ecirc;tes ici</pre></strong></body></html>\n",
  status: 200,
  opts: [],
  __module__: Tesla,
  __client__: %Tesla.Client{...}
}

Noticed the age header in the second response? It means the response has been returned by the cache.

Work is underway to implement the same type of middleware for the req library.

Any layer dealing with HTTP request/response cycle could use http_cache. I’m thinking of:

Stores

You might have noticed the store parameter. http_cache doesn’t store anything, and in the previous example we used the http_cache_store_process store. As its name suggests, it caches responses in the current process. Useful for testing, not much for production use.

http_cache comes with 2 LRU stores:

Why not a store that stores responses in both memory and on disk? Because it’s not needed. If you look at http_cache types, you might notice that a returned response’s body can either be IOdata or a pointer to a file:

-type response() :: {status(), headers(), body() | sendfile()}.

-type body() :: iodata().

-type sendfile() ::
    {sendfile,
     Offset :: non_neg_integer(),
     Length :: non_neg_integer() | all,
     Path :: binary()}.

When using http_cache_store_disk, a sendfile() response body is returned. It’s up to the user to get the file content from the disk. plug_http_cache, however, doesn’t do that but forwards the information to the adapter, that uses the sendfile system call: the kernel sends the content of the file directly to the socket, and caches the content in memory. The file is never loaded in userland. This is extremely efficient.

Both these stores support clustering. As long as they are in the same cluster and the feature is enabled, they will:

  • exchange cached responses between them
  • warm up joining nodes by sending them the most recently used responses
  • broadcast invalidation request

Both also implement backpressure mechanisms. An HTTP cache should never block a request/response cycle, which is why calls are non-blocking and caching can be stopped when some limits are reached:

  • too many responses are cached - a cleanup process is started as soon as possible to nuke least recently used entries
  • too many responses are queued for caching - new ones are dropped

This is essential to make sure the system remains stable. As a consequence, one cannot make any assumption as to whether a response will be cached or not.

Both backend emit telemetry events (http_cache as well, by the way). You can take a look at plug_http_cache_demo where telemetry events are transformed into nice dashboards:

plug_http_cache_demo dashboard

(Don’t take the numbers for real-world values - this is a testing environment with 3 Phoenix instances running in Docker and Tsung sending tons of requests - from the same old Thinkpad!)

Programmable caching subsystem

Since HTTP caching is performed within the BEAM, cached objects are directly available to users. The language to query them is, well, HTTP.

This enables optimizations that would otherwise be more complicated to implemented. For instance, how to look up in the cache for each individual resource of a batch request (/resources?id=1,2,3)? Let’s try in Elixir’s Phoenix controller:

lib/simple_api_web/product_controller.ex:

defmodule SimpleAPIWeb.ProductController do
  use SimpleAPIWeb, :controller

  require Logger

  alias SimpleAPI.Products

  @http_cache_opts Application.compile_env(:simple_api, :plug_http_cache_opts)

  def index(conn, %{"ids" => ids}) do
    products =
      ids
      |> String.split(",")
      |> Task.async_stream(fn id -> get_from_cache(id) || Products.get_product!(id) end)
      |> Enum.map(fn {:ok, value} -> value end)

    render(conn, :index, products: products)
  end

  def show(conn, %{"id" => id}) do
    product = Products.get_product!(id)
    render(conn, :show, product: product)
  end

  defp get_from_cache(id) do
    {"GET", url(~p"/api/products/#{id}"), [], ""}
    |> :http_cache.get(@http_cache_opts)
    |> process_cache_response()
  end

  defp process_cache_response({:fresh, {resp_ref, {_status, _headers, {:sendfile, _, _, path}}}}) do
    case File.read(path) do
      {:ok, body} ->
        :http_cache.notify_response_used(resp_ref, @http_cache_opts)

        {:from_cache, body}

      {:error, _} ->
        nil
    end
  end

  defp process_cache_response({:fresh, {resp_ref, {_status, _headers, body}}}) do
    :http_cache.notify_response_used(resp_ref, @http_cache_opts)

    {:from_cache, body}
  end

  defp process_cache_response(_) do
    nil
  end
end

lib/simple_api_web/product_json.ex:

defmodule SimpleAPIWeb.ProductJSON do
  alias SimpleAPI.Products.Product

  @doc """
  Renders a list of products.
  """
  def index(%{products: products}) do
    for(product <- products, do: data(product))
  end

  @doc """
  Renders a single product.
  """
  def show(%{product: product}) do
    data(product)
  end

  defp data(%Product{} = product) do
    %{
      id: product.id,
      name: product.name
    }
  end

  defp data({:from_cache, body}) do
    Jason.Fragment.new(body)
  end
end

(Note the opportune use of Jason.Fragment.new/1 that avoids decoding the individual JSON payload and encoding it again.)

When using a traditional DB, we’d probably not need to do that (especially retrieving products one by one). This might come handy, however, when hitting external APIs, or slower database like some NoSQL DBs.

We could do the opposite and cache individual products when processing a list of products. In practice, we’d need to encode the body (easy) and set the headers like Phoenix would have done for an individual request (harder).

Beyond HTTP, http_cache can actually be used to cache any data, such as GraphQL for example. As long as a method, a URI and cache-control headers are set, http_cache won’t care.

Conclusion

http_cache is intended to give another option to the BEAM world when it comes to HTTP caching with a solution that:

  • is native BEAM code and doesn’t require additional infrastructure dependencies
  • takes advantages of BEAM clustering capabilities
  • uses standard BEAM tools (telemetry…)
  • conforms as much as possible to RFCs
  • and remains simple

http_cache is not thoroughly load-tested at the time of writing, due to lack of time and resources. Hopefully, most of the work is done to provide with a viable HTTP caching layer on the BEAM, but feel free to leave feedback if you’re using it, or if you encountered problems with these libraries. Until then it’ll remain pre v1.0.

I’m also working on licensed versions of some of these libraries, available on the excellent codecodeship platform, but that’s a story for another day.