Conquering the Cache Stampede

by Sidney Wijngaarde · November 15, 2020 · 8 min read

The Thundering Herd

You’ve followed everyones advice and built a stateless application. The data isn’t in your app server and therefore, when you receive a request, you query the database. Your code looks something like the following with the database swapped out for simplicity:

// main.go
package main

import (
	"fmt"
	"log"
	"net/http"
	"time"
)

type DB struct{}

func (db *DB) Query(query string) string {
	// These log statements will be useful later on.
	fmt.Println("Query Invoked")
	time.Sleep(200 * time.Millisecond)
	fmt.Println("Query Complete")
	return "data"
}

var db DB

func main() {
	http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
		val, ok := r.URL.Query()["some_key"]

		if !ok {
			w.WriteHeader(http.StatusBadRequest)
			fmt.Fprintf(w, "invalid request")
			return
		}

		result := db.Query(val[0])
		fmt.Fprintf(w, "query result: %s", result)
	})

	port := ":8080"
	fmt.Printf("Server starting at http://localhost%s\n", port)
	log.Fatal(http.ListenAndServe(port, nil))
}

We can test the server with a quick curl command.

$ curl localhost:8080/?some_key=testing

# Output from server
Server starting at http://localhost:8080
Query Invoked
Query Complete

Time goes on, your service scales, and you start to receive many requests. Each request results in a database query, some of which are slow. Under sufficient load, your latency is unacceptable. You could try to optimize query execution time in the data layer by adding indexes, rewriting and benchmarking queries, migrating your data model, etc. This can be a rewarding and valuable use of time but there is a simpler solution. You can cache the results from the database for a given query in your app server. For some period of time, subsequent requests for the same data would be served from an in-memory cache, reducing the load on the database and your latency. Problem solved right? Your code would look something like this:

// main.go
package main

import (
	"fmt"
	"log"
	"net/http"
	"sync"
	"time"
)

type cacheEntry struct {
	value      string
	expiration time.Time
}

type Cache struct {
	data map[string]cacheEntry
	sync.RWMutex
	ttl time.Duration
}

func (c *Cache) Get(id string) (string, bool) {
	c.RLock()
	defer c.RUnlock()

	if entry, ok := c.data[id]; ok && entry.expiration.After(time.Now()) {
		return entry.value, ok
	}

	return "", false
}

func (c *Cache) Set(id string, val string) {
	c.Lock()
	defer c.Unlock()

	c.data[id] = cacheEntry{val, time.Now().Add(c.ttl)}
}

type SlowDB struct {
	sync.Mutex
}

func (db *SlowDB) Query(query string) string {
	db.Lock()
	defer db.Unlock()

	fmt.Println("Query Invoked")
	time.Sleep(5 * time.Second)
	fmt.Println("Query Complete")
	return "data"
}

var cache Cache = Cache{data: map[string]cacheEntry{}, ttl: time.Second * 30}
var db SlowDB

func main() {
	http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
		val, ok := r.URL.Query()["some_key"]
		if !ok {
			w.WriteHeader(http.StatusBadRequest)
			fmt.Fprintf(w, "invalid request")
			return
		}

		if result, ok := cache.Get(val[0]); ok {
			fmt.Fprintf(w, "cached result: %s", result)
			return
		}

		result := db.Query(val[0])
		cache.Set(val[0], result)
		fmt.Fprintf(w, "query result: %s", result)
	})

	port := ":8080"
	fmt.Printf("Server starting at http://localhost%s\n", port)
	log.Fatal(http.ListenAndServe(port, nil))
}

To test out server we can write a simple bash one-liner to request data and observe the results.

$ for i in {1..5}; do curl localhost:8080/?some_key=testing; done;

# Output from server
Server starting at http://localhost:8080
Query Invoked
Query Complete
Cache hit
Cache hit
Cache hit
Cache hit

This approach will likely work at low traffic when the application and the database are not under sufficient load. However, there is an issue. What happens when multiple concurrent requests are for the same data and the data hasn’t been cached? It’s worth noting that the http library will run the request handlers in separate goroutines so it is possible and likely that multiple calls to the database happen simultaneously. The reality is different from what you might expect. You might think that the first request populates the cache from the database and the others serve from the cache. What actually happens is we have 5 requests to the database for the same data.

Notice I’ve added an & to the curl command below to run them all at concurrently.

# Note this adds an "&" to run all queries at the same time in the background
$ for i in {1..5}; do curl localhost:8080/?some_key=testing & done;

# Output from server
Server starting at http://localhost:8080
Query Invoked
Query Complete
Query Invoked
Query Complete
Query Invoked
Query Complete
Query Invoked
Query Complete
Query Invoked
Query Complete

All the goroutines check the cache, the value is not present, and all goroutines query the database to populate the cache. This is known as “The Thundering Herd” or Cache Stampede. What’s more unfortunate is that this issue surfaces under sufficient load and therefore when the data layer can least afford taking on extra work, it receives multiple requests for the same data. Facebook released a video covering this topic a few years ago that’s worth a watch.

The Solution

How do we fix this? We need to coordinate our concurrent go routines such that only one routine populates the cache for a given key at a time. Other routines should wait for the cache to populate and then serve from it. To make this easier, we’ll push the population logic into the cache since it’s already shared by multiple routines. We’ll accept a loading function that queries the database given a key. We can hide decisions about loading from the database or reading from the cache from the caller and offer a single Get method.

Before we dive into the implementation, let’s go over the high level idea. We are going to create a channel when we populate a cache entry. If that channel exists, other go routines will wait for us to finish populating the value. Once we populate the value, close the channel and all other goroutines will return the value from the cache.

The code for the cache is lengthier than before and thus I’ve broken it out into it’s own file. The Get method contains the primary logic.

// cache.go
package main

import (
	"fmt"
	"sync"
	"time"
)

type Cache struct {
	sync.Mutex
	entries map[string]cacheEntry
	ttl     time.Duration
}

type cacheEntry struct {
	value      interface{}
	expiration time.Time
	loadch     chan struct{}
	err        error
}

type Loader func(key string) (interface{}, error)

func NewCache(ttl time.Duration) *Cache {
	return &Cache{entries: map[string]cacheEntry{}, ttl: ttl}
}

func (c *Cache) Get(key string, loader Loader) (interface{}, error) {
	value, shouldLoad, loadCh, err := c.tryCached(key)

	if value != nil || err != nil {
		fmt.Println("Cache hit")
		return value, err
	}

	// This goroutine can populate from the database.
	if shouldLoad {
		if loader == nil {
			return nil, fmt.Errorf("load method for cache get is nil")
		}

		return c.tryLoad(key, loader)
	}

	// Loading is in progress. The channel will close once loading is complete.
	<- loadCh;

	value, _, _, err = c.tryCached(key)
	fmt.Println("Cache hit")
	return value, err
}

func (c *Cache) tryCached(key string) (interface{}, bool, chan struct{}, error) {
	c.Lock()
	defer c.Unlock()

	entry, ok := c.entries[key]

	// Loading is in progress. Tell the caller to wait on the channel.
	if ok && entry.loadch != nil {
		return nil, false, entry.loadch, nil
	}

	// Cache hit case.
	if ok && entry.expiration.After(time.Now()) {
		return entry.value, false, entry.loadch, entry.err
	}

	// Entry either doesn't exist or is expired.
	entry.loadch = make(chan struct{})
	c.entries[key] = entry

	return nil, true, entry.loadch, nil
}

func (c *Cache) tryLoad(key string, loader Loader) (interface{}, error) {
	// Load outside of the lock.
	value, err := loader(key)
	c.Lock()
	defer c.Unlock()

	// Create or update the cache entry
	entry := c.entries[key]
	entry.value, entry.err = value, err
	entry.expiration = time.Now().Add(c.ttl)

	// Signal to other callers that the load is complete.
	close(entry.loadch)
	entry.loadch = nil
	c.entries[key] = entry
	return value, err
}

There are some issues with this cache implementation if one were to directly take this into production, namely, there is no eviction policy, and potentially heavy lock contention. Check out this article on caching in go if you want to learn more. With that said this cache illustrates the approach for solving the problem.

The main method for our server now needs to set up the database query as part of the loader function. It should look something like this.

// main.go

// left out setup for brevity
// ...
func main() {
	http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
		val, ok := r.URL.Query()["some_key"]
		// left out ok check

		result, _ := cache.Get(val[0], func(key string) (interface{}, error) {
			return db.Query(key), nil
		})

		fmt.Fprintf(w, "result: %v\n", result)
	})
	// ...
}

If we run our concurrent curl commands we’ll see that only the first request queries data from the database!

$ for i in {1..5}; do curl localhost:8080/?some_key=testing & done;

# Output from server
Server starting at http://localhost:8080
Query Invoked
Query Complete
Cache hit
Cache hit
Cache hit
Cache hit

We addressed the issue for a single process. In a distributed setting with an external cache like Redis the solution is analogous. There is already a great article on this by Redis Labs.

Thanks for reading and please let me know what you think in the comments.

The Thundering Herd

The Solution

Join my email list