Streaming SSR with Suspense
The Problem with Traditional SSR
Traditional SSR has a brutal constraint: the server must wait for ALL data before sending ANY HTML. If your page needs data from three APIs and one takes 3 seconds, the user stares at a blank screen for 3 seconds — even though the other two APIs responded in 50ms.
Traditional SSR Timeline:
Server receives request
├── Fetch user profile (50ms) ✓ done
├── Fetch product data (80ms) ✓ done
└── Fetch recommendations (3000ms) ... waiting ...
... still waiting ...
✓ done (3000ms)
Server renders full HTML → sends to browser
Browser receives HTML → user sees content
Total wait: 3000ms (bottlenecked by slowest fetch)
The fast APIs are done in under 100ms, but their content is held hostage by the slow one. The user gets nothing until everything is ready.
Traditional SSR is like a restaurant that won't bring any food until every dish for the table is ready. Your salad has been sitting under a heat lamp for 20 minutes while the kitchen finishes the steak. Streaming SSR is a restaurant that brings each dish as it's ready — salad first, then sides, then the steak. You start eating immediately.
How Streaming Works
Streaming SSR flips the model: send HTML chunks as they become ready, not all at once.
The server immediately sends the HTML shell — navigation, layout, loading skeletons — then streams in content chunks as each data source resolves. The browser progressively renders these chunks without waiting for the full response.
Streaming SSR Timeline:
Server receives request
├── Send HTML shell immediately (nav, layout, skeletons) → browser renders!
├── Fetch user profile (50ms) ✓ → stream chunk 1 → browser updates!
├── Fetch product data (80ms) ✓ → stream chunk 2 → browser updates!
└── Fetch recommendations (3000ms) ✓ → stream chunk 3 → browser updates!
User sees shell at: ~0ms
User sees profile + product: ~80ms
User sees recommendations: ~3000ms
The user sees useful content within milliseconds, not seconds. The slow API only delays its own section.
Suspense as Streaming Boundaries
In React, Suspense boundaries define where the stream can break. Each Suspense boundary is a potential streaming boundary — the server can send the fallback immediately and stream in the resolved content later.
import { Suspense } from 'react'
export default async function ProductPage({ params }) {
return (
<main>
<NavBar />
<Suspense fallback={<ProductSkeleton />}>
<ProductDetails slug={params.slug} />
</Suspense>
<Suspense fallback={<ReviewsSkeleton />}>
<Reviews slug={params.slug} />
</Suspense>
<Suspense fallback={<RecommendationsSkeleton />}>
<Recommendations slug={params.slug} />
</Suspense>
</main>
)
}
async function ProductDetails({ slug }) {
const product = await fetchProduct(slug)
return <div>{product.name} — ${product.price}</div>
}
async function Reviews({ slug }) {
const reviews = await fetchReviews(slug)
return <ul>{reviews.map(r => <li key={r.id}>{r.text}</li>)}</ul>
}
async function Recommendations({ slug }) {
const recs = await fetchRecommendations(slug)
return <div>{recs.map(r => <RecCard key={r.id} {...r} />)}</div>
}
The server sends NavBar and all three skeleton fallbacks immediately. As each async component resolves, React streams a chunk that replaces the skeleton with real content.
The Wire Format: How Chunks Replace Fallbacks
Here is the part that most tutorials skip. When React streams in a resolved Suspense boundary, it doesn't send a separate HTTP request or use WebSockets. It's all one HTTP response — a single, long-lived connection using chunked transfer encoding.
The initial HTML contains placeholder elements:
<!--$?-->
<template id="B:0"></template>
<div>Loading reviews...</div>
<!--/$-->
When the reviews data resolves, React streams an additional chunk at the bottom of the response:
<div hidden id="S:0">
<ul><li>Great product!</li><li>Highly recommend</li></ul>
</div>
<script>
// Swap the fallback with the resolved content
$RC("B:0", "S:0")
</script>
The inline $RC function (React's own tiny runtime) finds the template marker B:0, removes the fallback, and inserts the hidden content S:0 in its place. This is a pure DOM operation — no React hydration needed for the swap itself.
Out-of-order streaming
Chunks can arrive in any order. If recommendations resolve before reviews, React streams the recommendations chunk first. The $RC function uses IDs to find the correct placeholder, so it doesn't matter what order chunks arrive — each one knows exactly where it belongs. This is how React achieves out-of-order streaming without any coordination between Suspense boundaries.
The React APIs
React provides two streaming APIs depending on your runtime:
Node.js: renderToPipeableStream
import { renderToPipeableStream } from 'react-dom/server'
function handleRequest(req, res) {
const { pipe, abort } = renderToPipeableStream(<App />, {
bootstrapScripts: ['/client.js'],
onShellReady() {
res.statusCode = 200
res.setHeader('Content-Type', 'text/html')
pipe(res)
},
onShellError(error) {
res.statusCode = 500
res.send('<!doctype html><p>Server error</p>')
},
onError(error) {
console.error(error)
}
})
setTimeout(() => abort(), 10000)
}
Edge/Web: renderToReadableStream
import { renderToReadableStream } from 'react-dom/server'
async function handleRequest(req) {
const stream = await renderToReadableStream(<App />, {
bootstrapScripts: ['/client.js'],
signal: AbortSignal.timeout(10000)
})
return new Response(stream, {
headers: { 'Content-Type': 'text/html' }
})
}
The key callbacks:
onShellReady— The shell (everything outside Suspense boundaries) is rendered. Start piping.onShellError— The shell itself failed to render. Send a fallback error page.onAllReady— Everything, including all Suspense boundaries, is resolved. Used for static generation (you want the complete HTML).onError— An error occurred in a Suspense boundary. The fallback stays; the boundary doesn't resolve.
Streaming and SEO
A common concern: "If the initial HTML has skeletons, will search engines see the real content?"
Modern crawlers (Googlebot) handle streaming well. They wait for the full response and process the final DOM state, including all streamed chunks. The $RC script swaps happen before the crawler evaluates the page.
However, if you're concerned about crawlers that don't execute JavaScript, you can use onAllReady instead of onShellReady for bot user agents:
const { pipe } = renderToPipeableStream(<App />, {
onShellReady() {
if (isBot(req)) return
res.statusCode = 200
pipe(res)
},
onAllReady() {
if (!isBot(req)) return
res.statusCode = 200
pipe(res)
}
})
Bots get the full HTML in one shot. Real users get streaming.
Don't nest Suspense boundaries too deeply thinking it helps streaming. Each boundary adds overhead — a template marker, a hidden div, and a swap script. For most pages, 2-4 Suspense boundaries at the top level is ideal: navigation, hero content, main content, sidebar/comments. Over-granular boundaries create unnecessary DOM complexity and more inline scripts.
Streaming in Next.js App Router
In Next.js, streaming is the default for async Server Components wrapped in Suspense. You don't call renderToPipeableStream directly — Next.js handles it.
import { Suspense } from 'react'
import { LoadingSkeleton } from '@/components/ui'
export default async function Dashboard() {
return (
<div>
<h1>Dashboard</h1>
<Suspense fallback={<LoadingSkeleton />}>
<Analytics />
</Suspense>
<Suspense fallback={<LoadingSkeleton />}>
<RecentActivity />
</Suspense>
</div>
)
}
You can also use loading.tsx files, which create an implicit Suspense boundary at the route segment level:
app/
dashboard/
loading.tsx ← Suspense fallback for this route
page.tsx ← async Server Component
The loading.tsx content displays instantly while page.tsx resolves.
| What developers do | What they should do |
|---|---|
| Wait for onAllReady before piping for real users onAllReady waits for every Suspense boundary, eliminating the entire benefit of streaming. Users get a slow, traditional SSR experience. | Use onShellReady for streaming to users, onAllReady only for bots or static generation |
| Put the entire page inside a single Suspense boundary One big Suspense boundary means nothing streams until everything resolves — you're back to the traditional SSR bottleneck. | Wrap individual sections that have their own data dependencies in separate Suspense boundaries |
| Assume streaming requires WebSockets or Server-Sent Events It's a regular HTTP response that stays open. The server sends HTML chunks over the same connection. No special protocols needed. | Streaming SSR uses standard HTTP chunked transfer encoding in a single response |
| Skip fallback design because skeletons are temporary Skeletons are the first thing users see. If they're different sizes than the real content, you'll get layout shift when content streams in. | Design polished skeletons that match the content layout to prevent layout shift (CLS) |
- 1Traditional SSR waits for the slowest data source. Streaming SSR sends the shell immediately and streams chunks as data resolves.
- 2Each Suspense boundary is a potential streaming boundary — the fallback is sent first, resolved content is streamed later.
- 3Chunks are sent as HTML with inline scripts that swap fallbacks — it's one HTTP response, not multiple requests.
- 4Out-of-order streaming lets fast data sources resolve independently of slow ones.
- 5Use onShellReady for streaming to users. Use onAllReady for bots and static generation.
- 6In Next.js, async Server Components inside Suspense boundaries stream automatically.