Caching Strategies in Depth

advanced18 min read

The Fastest Request Is the One You Never Make

Every millisecond your user waits, engagement drops. Study after study confirms it — Google found that a 100ms increase in search latency reduced traffic by 0.2%. Amazon measured a 1% revenue loss per 100ms. The single most effective way to eliminate latency is to not make the request at all.

That is what caching does. But "caching" is not one thing — it is a stack of layers, each with different lifetimes, eviction rules, and failure modes. Most developers understand caching at a surface level ("set some headers, maybe use a CDN"). Top 1% engineers understand the full picture: which layer serves a request, why, and what happens when they conflict.

Mental Model

Think of caching like a series of concentric security checkpoints at an airport. Your request starts at the outermost ring (browser memory) and only reaches the next checkpoint (disk cache, then CDN, then origin server) if the previous one couldn't serve it. Each layer is progressively slower but more authoritative. The goal is to satisfy as many requests as possible at the outermost, fastest layer.

The Caching Stack

Before diving into individual layers, here is the full stack a browser request can pass through, from fastest to slowest:

Browser Caching Stack (Fastest → Slowest)Phase 1 / 6

Phase 1 / 6Memory Cache

In-process RAM. Sub-millisecond. Cleared on tab close.

~0ms

1/6

Every layer has trade-offs. Let's walk through each one.

Browser Memory Cache vs Disk Cache

The browser maintains two internal caches that most developers never think about directly.

Memory cache lives in the renderer process's RAM. It is blazing fast (sub-millisecond lookups) and stores resources that the current page session has already fetched — scripts, stylesheets, images. When you load a page and it references styles.css, the first fetch goes to the network. Every subsequent reference to that same file within the session hits memory cache. Memory cache is cleared when you close the tab.

Disk cache persists to the hard drive. It is slower than memory (filesystem I/O) but survives tab closes, browser restarts, and even system reboots. The browser decides what goes into disk cache based on resource size, type, and HTTP caching headers.

You do not control the split between memory and disk cache directly — the browser makes that decision. But you control what gets cached and for how long through HTTP headers.

How to see which cache served a request

Open Chrome DevTools, go to the Network tab, and look at the Size column. You will see (memory cache), (disk cache), or the actual byte size (meaning it went to the network). This tells you exactly which layer served each resource.

Quiz

You load a page, then click a link to another page on the same site that uses the same CSS file. Where does the CSS come from on the second page?

ABCD

Cache-Control: The Master Header

Cache-Control is the single most important HTTP header for caching. It tells the browser (and any intermediate caches like CDNs) exactly how to handle a response.

The Essential Directives

Cache-Control: max-age=31536000, immutable
Cache-Control: no-cache
Cache-Control: no-store
Cache-Control: max-age=0, must-revalidate
Cache-Control: max-age=3600, stale-while-revalidate=86400

Here is what each directive actually means:

max-age=N — The response is fresh for N seconds. During this window, the browser serves it from cache without contacting the server at all. No network request. Zero latency. max-age=31536000 means "cache for one year."

immutable — Even if the user hard-refreshes, do not revalidate. This is critical for content-hashed assets like app.a1b2c3.js — the filename changes when the content changes, so there is never a reason to revalidate.

no-cache — This is the most misunderstood directive. It does NOT mean "do not cache." It means "you can cache it, but you must revalidate with the server before using it." Every request hits the network, but if the server says "nothing changed" (304), no response body is transferred.

no-store — This actually means "do not cache." The browser must not store the response anywhere — not memory, not disk. Use this for sensitive data like banking pages, authentication tokens, or personally identifiable information.

must-revalidate — Once the response becomes stale (past max-age), the cache must not serve it without revalidating. Without this, some caches might serve stale content in certain edge cases (like when the user is offline).

stale-while-revalidate=N — This is the game-changer for perceived performance. After max-age expires, the browser can still serve the stale response immediately while fetching a fresh copy in the background. The user gets instant content, and the cache is silently refreshed for the next request.

Directive	Caches response?	Revalidates?	Best for
max-age=31536000, immutable	Yes, for 1 year	Never (even on refresh)	Hashed static assets (JS, CSS, images)
no-cache	Yes	Every request	HTML pages, API responses that change
no-store	No	N/A	Sensitive data (auth, banking, PII)
max-age=3600, stale-while-revalidate=86400	Yes, fresh for 1h	After 1h, serves stale + background refresh	Semi-dynamic content (blog posts, docs)
max-age=0, must-revalidate	Yes (but always stale)	Every request	Content that must be current but benefits from 304s

What developers do	What they should do
Using no-cache to prevent caching no-cache still stores the response — it just forces revalidation on every request. no-store is the only directive that prevents storage entirely.	Use no-store to truly prevent caching
Setting max-age=31536000 on HTML files HTML files are your entry point. If they are cached for a year, users cannot get updated JS/CSS references. Cache the assets aggressively, but keep HTML fresh.	Use no-cache or short max-age on HTML files
Using max-age without immutable on hashed assets Without immutable, a hard refresh (Cmd+Shift+R) revalidates every asset. With immutable, even hard refresh serves from cache — because the hash guarantees content has not changed.	Always pair max-age with immutable for content-hashed files

Quiz

A response has Cache-Control: no-cache. A user visits the page. What happens on their second visit?

ABCD

ETag and Conditional Requests

When the browser revalidates a cached response, it needs a way to ask: "Has this changed?" That is where ETags come in.

An ETag (Entity Tag) is a unique identifier for a specific version of a resource. The server generates it — typically a hash of the response body, or a version string.

Here is the flow:

Browser requests /api/products
Server responds with the data and includes ETag: "abc123"
Browser caches the response along with the ETag
On the next request, the browser sends If-None-Match: "abc123"
Server checks: has the resource changed? If not, it responds 304 Not Modified — no body, saving bandwidth
If the resource did change, the server responds 200 with the new body and a new ETag

HTTP/1.1 200 OK
Cache-Control: no-cache
ETag: "a1b2c3d4"
Content-Type: application/json

{"products": [...]}

GET /api/products HTTP/1.1
If-None-Match: "a1b2c3d4"

HTTP/1.1 304 Not Modified
ETag: "a1b2c3d4"

The 304 response is tiny — just headers, no body. For a 500KB JSON response, this turns a 500KB transfer into a few hundred bytes.

ETag vs Last-Modified

Last-Modified is the older mechanism. The server sends Last-Modified: Wed, 09 Apr 2025 10:00:00 GMT, and the browser sends If-Modified-Since on subsequent requests. ETags are more precise because they compare content, not timestamps. A file that was re-deployed at a new timestamp but with identical content gets a 304 with ETags but a 200 with Last-Modified.

Strong vs weak ETags

A strong ETag (default) means byte-for-byte identical. A weak ETag, prefixed with W/ like W/"abc123", means semantically equivalent — the content might differ in insignificant ways (like whitespace). Most servers use strong ETags.

Service Worker Cache Strategies

Service Workers give you a programmable network proxy that sits between your page and the network. Unlike HTTP caching headers (which are declarative), Service Workers are imperative — you write JavaScript code that intercepts every fetch and decides how to handle it.

This is where caching gets truly powerful. There are five canonical strategies:

Cache-First (Cache Falling Back to Network)

self.addEventListener('fetch', (event) => {
  event.respondWith(
    caches.match(event.request).then((cached) => {
      return cached || fetch(event.request).then((response) => {
        const clone = response.clone()
        caches.open('v1').then((cache) => cache.put(event.request, clone))
        return response
      })
    })
  )
})

Check the cache first. If the resource is cached, return it immediately — zero network. If not, fetch from network, cache the response for next time, and return it.

Best for: Static assets that rarely change (fonts, icons, images, hashed JS/CSS).

Network-First (Network Falling Back to Cache)

self.addEventListener('fetch', (event) => {
  event.respondWith(
    fetch(event.request)
      .then((response) => {
        const clone = response.clone()
        caches.open('v1').then((cache) => cache.put(event.request, clone))
        return response
      })
      .catch(() => caches.match(event.request))
  )
})

Try the network first. If it succeeds, cache the response and return it. If the network fails (offline, timeout), fall back to the cached version.

Best for: Dynamic content that should be fresh when possible but available offline (news feeds, API responses, user data).

Stale-While-Revalidate

self.addEventListener('fetch', (event) => {
  event.respondWith(
    caches.match(event.request).then((cached) => {
      const fetchPromise = fetch(event.request).then((response) => {
        const clone = response.clone()
        caches.open('v1').then((cache) => cache.put(event.request, clone))
        return response
      })
      return cached || fetchPromise
    })
  )
})

Return the cached version immediately (instant perceived speed), then fetch an update in the background. The next visit gets the fresh version.

Best for: Content that updates occasionally but does not need to be perfectly current on every load (blog posts, product listings, documentation).

Cache-Only

self.addEventListener('fetch', (event) => {
  event.respondWith(caches.match(event.request))
})

Only serve from cache. If not cached, the request fails. Used for resources that were pre-cached during Service Worker installation.

Best for: App shell components in offline-first PWAs.

Network-Only

self.addEventListener('fetch', (event) => {
  event.respondWith(fetch(event.request))
})

Always go to the network. Never cache. This is equivalent to not having a Service Worker for this route.

Best for: Non-GET requests, analytics pings, real-time data.

Strategy	Speed	Freshness	Offline support	Best for
Cache-First	Instant (if cached)	Can be stale	Full	Static assets, fonts, images
Network-First	Network speed	Always fresh	Graceful fallback	API data, dynamic content
Stale-While-Revalidate	Instant (if cached)	Updated next visit	Partial	Docs, blog posts, listings
Cache-Only	Instant	Only what was pre-cached	Full (pre-cached only)	PWA app shell
Network-Only	Network speed	Always fresh	None	Analytics, mutations, real-time data

Quiz

Your app shows a product catalog that updates a few times per day. Users complain about slow page loads. Which Service Worker strategy gives the fastest perceived load while keeping content reasonably fresh?

ABCD

The Cache API

The Service Worker strategies above use the Cache API under the hood. It is a key-value store where keys are Request objects and values are Response objects.

const cache = await caches.open('my-cache-v1')

await cache.put(request, response)

const cached = await cache.match(request)

await cache.delete(request)

const keys = await cache.keys()

Key things to know about the Cache API:

It is separate from the HTTP cache. A resource can be in the Cache API, the HTTP cache, both, or neither.
It is not size-limited the same way HTTP cache is. But browsers do impose storage quotas (typically 50-80% of available disk space shared across all storage APIs).
You must clone responses before caching. A Response body can only be consumed once. If you read it (to return to the page) and then try to cache it, the body is already empty.
Cache matching is by URL by default. Use ignoreSearch: true to match regardless of query parameters, or ignoreVary: true to ignore Vary headers.

Common Trap

The Cache API does not automatically expire entries. If you cache 10,000 requests and never delete them, they stay forever (until the browser evicts them under storage pressure). You must implement your own eviction strategy — typically by versioning your cache names (v1, v2) and deleting old caches during the activate event.

CDN Caching

A CDN (Content Delivery Network) is a geographically distributed network of servers that cache your content closer to your users. When a user in Tokyo requests your site hosted in Virginia, the CDN serves it from a Tokyo edge node instead of making the full round trip across the Pacific.

CDN caching operates at the edge — between the browser and your origin server. CDN edge nodes respect Cache-Control headers from your origin, but most CDNs also support their own caching rules via configuration or custom headers.

CDN vs Browser Cache

	Browser cache	CDN cache
Scope	Single user	All users in a region
Location	User's device	Edge server near user
Control	Cache-Control headers	Headers + CDN config
Invalidation	User clears cache / headers expire	Purge API / TTL expiry
Best for	User-specific, repeat visits	Shared content, first visits

CDN Cache-Control: The s-maxage Directive

s-maxage (shared max-age) sets the cache duration specifically for shared caches (CDNs, reverse proxies) without affecting the browser cache.

Cache-Control: max-age=0, s-maxage=86400

This says: "Browsers, always revalidate. CDN, cache for 24 hours." The CDN serves cached content to all users in a region for 24 hours, while each user's browser checks with the CDN on every visit (getting a fast response from the nearby edge node).

Vary header and CDN caching

The Vary header tells caches to store separate versions based on request headers. Vary: Accept-Encoding is fine (most CDNs handle it). But Vary: Cookie or Vary: Authorization effectively disables CDN caching because every user gets a different cache entry. Be intentional about what you Vary on.

Next.js Caching Layers

Next.js 15 has its own multi-layered caching system that sits on top of browser and CDN caching. Understanding these layers is critical for anyone building with Next.js, because misconfigured caching is the number one source of "my page is stale" bugs.

Next.js Caching LayersPhase 1 / 4

Phase 1 / 4Router Cache

Client-side in-memory cache of RSC payloads. Stores prefetched and visited routes.

Client

Cleared on page refresh. Dynamic routes cached for 30s, static for 5min.

1/4

Router Cache (Client-Side)

When you navigate between routes in a Next.js app, React Server Component payloads are cached in memory on the client. Prefetched routes (via next/link) are also stored here. This means navigating back to a previously visited page is instant.

Key behaviors:

Dynamic routes are cached for 30 seconds
Static (prefetched) routes are cached for 5 minutes
Hard refresh clears it entirely
router.refresh() clears the cache for the current route

Full Route Cache (Server-Side)

At build time, Next.js renders static routes and caches the HTML and RSC payload on the server. Subsequent requests serve this cached result without re-rendering.

This only applies to routes that can be statically determined at build time. You opt out by using dynamic functions like cookies(), headers(), or searchParams.

Data Cache

When you call fetch() in a Server Component, Next.js caches the response in its Data Cache. This cache persists across requests and even deployments — meaning a cached API response from yesterday is still served today unless you explicitly revalidate.

const data = await fetch('https://api.example.com/products', {
  next: { revalidate: 3600 }
})

This caches the response for one hour. After one hour, the next request triggers a background revalidation (similar to stale-while-revalidate at the framework level).

To opt out entirely:

const data = await fetch('https://api.example.com/products', {
  cache: 'no-store'
})

On-Demand Revalidation

Instead of time-based revalidation, you can invalidate caches explicitly:

import { revalidatePath, revalidateTag } from 'next/cache'

revalidatePath('/products')

revalidateTag('products')

Tag-based revalidation is more surgical. Tag a fetch with next: { tags: ['products'] }, and calling revalidateTag('products') invalidates only fetches with that tag.

Quiz

A Next.js page uses fetch() with { next: { revalidate: 60 } }. A user visits the page 90 seconds after the last revalidation. What happens?

ABCD

Cache Busting with Content Hashing

Here is the fundamental tension with caching: you want assets cached aggressively for performance, but you also want users to get the latest version when you deploy. Cache busting resolves this.

The strategy is simple and elegant:

Give every asset a filename that includes a hash of its content: app.a1b2c3.js
Set Cache-Control: max-age=31536000, immutable on these files
When the content changes, the hash changes, producing a new filename: app.d4e5f6.js
The HTML file (which is NOT cached aggressively) references the new filename
The browser has never seen this new URL before, so it fetches it fresh

Old HTML:  <script src="/app.a1b2c3.js"></script>
New HTML:  <script src="/app.d4e5f6.js"></script>

The old file stays in cache (wasting a bit of disk space, but harmless). The new file is fetched fresh. Users always get the latest code on their next page load.

This is why the caching rules for HTML and assets are fundamentally different:

HTML: Cache-Control: no-cache — always revalidate (cheap 304 if unchanged)
Hashed assets: Cache-Control: max-age=31536000, immutable — cache forever, never revalidate

Key Rules

1HTML pages use no-cache (always revalidate, cheap 304s)
2Hashed static assets use max-age=31536000, immutable (cache forever)
3no-cache means revalidate, not do-not-cache — use no-store to prevent storage
4stale-while-revalidate gives instant loads with background freshness updates
5Service Worker cache and HTTP cache are separate systems — a resource can be in one, both, or neither
6CDN caching benefits all users in a region, browser caching benefits one user across visits
7Next.js has four caching layers — Router Cache, Full Route Cache, Data Cache, Request Memoization
8Always clone a Response before caching it in the Cache API — the body can only be consumed once

Putting It All Together: A Real-World Caching Architecture

Here is how a well-architected production app layers caching:

Asset type          | Cache-Control header                        | CDN    | Service Worker
--------------------|---------------------------------------------|--------|------------------
HTML pages          | no-cache                                    | s-maxage=60  | Network-First
Hashed JS/CSS       | max-age=31536000, immutable                 | Same   | Cache-First
Images (hashed)     | max-age=31536000, immutable                 | Same   | Cache-First
API responses       | no-cache, ETag                              | Varies | Stale-While-Revalidate
User-specific data  | no-store                                    | No     | Network-Only
Fonts               | max-age=31536000, immutable                 | Same   | Cache-First

The layering is intentional: HTTP headers provide the baseline, CDN adds geographic distribution, and the Service Worker adds offline capability and fine-grained control.

What happens when layers conflict? The Service Worker always wins because it intercepts requests before the browser checks the HTTP cache. If your Service Worker returns a cached response, the browser never even looks at Cache-Control headers. This is incredibly powerful but also a common source of bugs — a misconfigured Service Worker can serve stale content indefinitely, ignoring your carefully crafted cache headers. Always version your Service Worker caches and clean up old versions in the activate event.

Quiz

Your team deploys a new version of the app. Users report they still see the old version even after refreshing. The JS files are content-hashed. What is the most likely cause?

ABCD

Quiz

A Service Worker uses cache-first strategy for all requests, including the API. The API data changes on the server. When will the user see the updated data?

ABCD

Interview Question

"Walk me through what happens when a browser makes a request for a cached resource. Which layers are checked, in what order, and how do you control each layer?"

Start with the Service Worker — if registered, it intercepts the request before anything else. The Service Worker can respond from its own Cache API, fetch from the network, or combine strategies. If no Service Worker or it passes through, the browser checks its HTTP cache (memory first, then disk). Cache-Control headers determine if the cached response is still fresh. If fresh, it is served immediately. If stale, a conditional request (with If-None-Match or If-Modified-Since) goes to the CDN or origin. The CDN checks its own cache, potentially serving without hitting origin. Only if all cache layers miss does the request reach the origin server. You control each layer through different mechanisms: Service Worker with JavaScript, HTTP cache with Cache-Control headers, CDN with s-maxage and CDN config, origin with your server logic.

What developers do	What they should do
Using cache-first Service Worker strategy for API responses Cache-first never checks the network if a cached response exists. API data goes stale permanently. Reserve cache-first for immutable assets like hashed JS and fonts.	Use network-first or stale-while-revalidate for dynamic data
Setting Vary: Cookie on CDN-cached responses Each unique Cookie header creates a separate cache entry, effectively giving every user their own CDN cache — which defeats the entire purpose of CDN caching.	Remove Vary: Cookie or use separate cache keys
Forgetting to clone Response before caching A Response body is a ReadableStream that can only be consumed once. If you return it to the page and then try to cache it, the body is already empty. Clone it first so you have two copies.	Always call response.clone() before passing to cache.put()
Caching everything in the Service Worker without eviction The Cache API has no automatic expiration. Without cleanup, storage grows unbounded until the browser forcibly evicts your data — which you cannot control or predict.	Version your caches and delete old versions in the activate event