Token-by-Token Rendering
The 50-Tokens-Per-Second Problem
You're building a chat UI. The LLM streams back tokens — subword chunks like "Hello", " there", ",", " how". Maybe 10-50 tokens per second. Your first instinct is obvious:
const [messages, setMessages] = useState<Message[]>([]);
function onToken(token: string) {
setMessages(prev => {
const updated = [...prev];
const last = updated[updated.length - 1];
updated[updated.length - 1] = {
...last,
content: last.content + token,
};
return updated;
});
}
This works. For about two seconds. Then the message grows. React re-renders the entire message list on every single token. The markdown parser re-parses the entire accumulated text. The browser reflows the layout. At 40 tokens per second, that's 40 full render cycles per second — each one doing more work than the last because the message keeps growing.
On a fast MacBook, you might not notice. On a mid-range Android? The UI locks up. Scroll becomes janky. The typing cursor in the input field freezes. Your users think the app crashed.
Here's the thing most developers miss: the problem isn't React. React's diffing is fast. The problem is that you're asking React to diff an increasingly large tree at an absurdly high frequency. It's death by a thousand paper cuts.
Picture a court stenographer transcribing a speech word by word. After every word, they retype the entire transcript from the beginning, print it, and hand it to every person in the room. By page 50, they're spending more time retyping old content than writing new words. The fix is obvious: just append the new word to the existing page. That's the core insight behind token-by-token rendering — stop re-creating what already exists.
Measuring the Pain
Before we fix anything, let's quantify exactly how bad naive rendering gets. Open Chrome DevTools, record a Performance trace while streaming a long response, and you'll see something like this:
Token 1: render 0.3ms (total message: 5 chars)
Token 50: render 1.2ms (total message: 300 chars)
Token 200: render 4.8ms (total message: 1500 chars)
Token 500: render 12ms (total message: 4000 chars)
Token 1000: render 28ms (total message: 8000 chars)
At token 500, each render already exceeds a 16.6ms frame budget at 60fps. The user sees stutter. By token 1000, you're dropping more frames than you're painting. And that's just the React render — we haven't even counted markdown parsing, syntax highlighting, or layout reflow.
The cost curve is roughly O(n) per token where n is total accumulated content. Total cost for the entire stream is O(n squared) — each token re-processes everything before it. For a 2000-token response, that's millions of unnecessary DOM operations.
Strategy 1: Token Batching with requestAnimationFrame
The first optimization targets the frequency of updates. Humans can't perceive individual updates faster than about 60fps. If tokens arrive at 40/second, batching them into frame-aligned flushes gives you identical visual results with far fewer renders.
function useTokenBatcher() {
const bufferRef = useRef<string>('');
const rafRef = useRef<number>(0);
const [text, setText] = useState('');
const pushToken = useCallback((token: string) => {
bufferRef.current += token;
if (!rafRef.current) {
rafRef.current = requestAnimationFrame(() => {
setText(prev => prev + bufferRef.current);
bufferRef.current = '';
rafRef.current = 0;
});
}
}, []);
const reset = useCallback(() => {
if (rafRef.current) cancelAnimationFrame(rafRef.current);
rafRef.current = 0;
bufferRef.current = '';
setText('');
}, []);
return { text, pushToken, reset };
}
Here's what's happening: tokens arrive and accumulate in a plain string ref. The first token schedules a requestAnimationFrame. Subsequent tokens within the same frame just append to the buffer — no state update, no re-render. When the frame fires, all accumulated tokens flush as a single state update.
If tokens arrive at 40/second and the display runs at 60fps, you go from 40 renders/second to at most 60 — but in practice, multiple tokens land in the same frame, so you'll see closer to 20-30 renders/second. Each render also does less work because the state update is a simple string concatenation, not a full message array clone.
Why Not debounce or setTimeout?
You might think: "I'll just debounce the state update with a 16ms delay." Two problems:
-
setTimeoutisn't frame-aligned. A 16ms timeout can fire at the start of a frame, the middle, or right before the next frame — you have no guarantee.requestAnimationFramefires at exactly the right time: just before the browser paints. -
Debounce delays the final update. With debounce, the last token in a burst waits for the debounce window to expire before displaying. rAF has no such trailing delay — it flushes on the next paint.
Strategy 2: Append-Only DOM Updates
Batching reduces how often you render. But each render still processes the entire message. The next optimization is making each render touch only the new content.
The key insight: once a token is rendered to the DOM, it never changes. LLM output is append-only. There's no reason to re-process or re-render text that's already on screen.
The Ref-Based Approach
The simplest append-only pattern bypasses React's reconciliation entirely for the streaming content:
function StreamingMessage({ isStreaming }: { isStreaming: boolean }) {
const containerRef = useRef<HTMLDivElement>(null);
const contentRef = useRef('');
const appendTokens = useCallback((newText: string) => {
contentRef.current += newText;
if (containerRef.current) {
const span = document.createElement('span');
span.textContent = newText;
containerRef.current.appendChild(span);
}
}, []);
return <div ref={containerRef} className="message-content" />;
}
This is dramatically faster. Every token is O(1) — create one text node, append it. The browser doesn't reflow the entire message because appendChild only triggers incremental layout. React doesn't even know the DOM changed, which is fine because React doesn't need to manage this part of the tree during streaming.
The Hybrid Approach
Pure DOM manipulation loses React's benefits: no markdown rendering, no syntax highlighting, no component-based structure. The production-ready pattern is a hybrid — stream with raw DOM, then "commit" to React when the stream ends:
function StreamingMessage() {
const containerRef = useRef<HTMLDivElement>(null);
const accumulatedRef = useRef('');
const [finalContent, setFinalContent] = useState<string | null>(null);
const appendTokens = useCallback((text: string) => {
accumulatedRef.current += text;
if (containerRef.current) {
const node = document.createTextNode(text);
containerRef.current.appendChild(node);
}
}, []);
const finalize = useCallback(() => {
setFinalContent(accumulatedRef.current);
}, []);
if (finalContent !== null) {
return <MarkdownRenderer content={finalContent} />;
}
return <div ref={containerRef} />;
}
During streaming, you get O(1) appends with no React overhead. When the stream completes, you swap in the full React-rendered markdown — parsed once, with syntax highlighting, links, and all the fancy stuff. The user sees a brief flash of unstyled text becoming styled markdown, but in practice this transition is nearly invisible because most of the text was already on screen.
When switching from raw DOM to React-rendered markdown, the content can visibly "jump" as styling changes. To smooth this over, ensure your plain text container has the same font family, size, and line height as your markdown renderer. Match the CSS and the jump becomes imperceptible.
Strategy 3: Smart Auto-Scroll
Scroll behavior during streaming is one of those things that seems trivial until you build it. There are two distinct states:
- User is at the bottom — auto-scroll to keep new content visible
- User scrolled up — don't auto-scroll (they're reading earlier content)
Get this wrong and you'll hear about it. Force-scrolling when a user is reading earlier messages is infuriating. Not auto-scrolling when they're watching the stream is equally annoying.
Detecting Scroll Position
function useAutoScroll(containerRef: RefObject<HTMLElement | null>) {
const isNearBottomRef = useRef(true);
const rafRef = useRef<number>(0);
const checkScrollPosition = useCallback(() => {
const el = containerRef.current;
if (!el) return;
const threshold = 100;
const distanceFromBottom =
el.scrollHeight - el.scrollTop - el.clientHeight;
isNearBottomRef.current = distanceFromBottom < threshold;
}, [containerRef]);
const scrollToBottom = useCallback(() => {
if (!isNearBottomRef.current) return;
if (rafRef.current) cancelAnimationFrame(rafRef.current);
rafRef.current = requestAnimationFrame(() => {
const el = containerRef.current;
if (el) {
el.scrollTop = el.scrollHeight;
}
rafRef.current = 0;
});
}, [containerRef]);
useEffect(() => {
const el = containerRef.current;
if (!el) return;
el.addEventListener('scroll', checkScrollPosition, { passive: true });
return () => {
el.removeEventListener('scroll', checkScrollPosition);
if (rafRef.current) cancelAnimationFrame(rafRef.current);
};
}, [containerRef, checkScrollPosition]);
return { scrollToBottom, isNearBottomRef };
}
A few critical details here:
- Threshold of 100px — don't check for
distanceFromBottom === 0. Scroll positions are often fractional due to subpixel rendering. A small threshold (50-150px) accounts for this. - Passive scroll listener — the scroll event is tracked passively so it doesn't block the compositor thread.
- rAF-batched scroll — scrolling is batched into animation frames to avoid forced reflows from setting
scrollTopon every token.
CSS overflow-anchor
There's a CSS property that helps with scroll anchoring: overflow-anchor. When content is added above the viewport, the browser automatically adjusts scroll position to keep the currently visible content stable.
.message-list {
overflow-y: auto;
overflow-anchor: auto; /* browser maintains scroll position */
}
.streaming-indicator {
overflow-anchor: none; /* don't anchor to the loading spinner */
}
This is useful for chat UIs where new messages appear at the bottom. Without overflow-anchor, inserting content can cause the viewport to jump. The browser compensates automatically — but you need to opt out for elements like loading indicators that shouldn't be anchor targets.
Strategy 4: React Patterns for Streaming
Let's talk about the React-specific patterns that make or break streaming performance.
Isolate the Streaming Component
The most impactful pattern: make sure only the currently streaming message re-renders, not the entire message list.
function MessageList({ messages }: { messages: Message[] }) {
return (
<div className="message-list">
{messages.map(msg => (
<MemoizedMessage key={msg.id} message={msg} />
))}
</div>
);
}
const MemoizedMessage = memo(function Message({
message,
}: {
message: Message;
}) {
if (message.status === 'streaming') {
return <StreamingMessage message={message} />;
}
return <StaticMessage message={message} />;
});
React.memo prevents completed messages from re-rendering when the streaming message updates. But this only works if you're not passing new object references as props on every render. A common mistake is spreading the message into new objects:
// This defeats memo — new object every render
<MemoizedMessage key={msg.id} message={{ ...msg }} />
// This works — same reference if msg hasn't changed
<MemoizedMessage key={msg.id} message={msg} />
useRef for Accumulation, useState for Display
The pattern that ties everything together:
function useStreamingMessage() {
const accumulatedRef = useRef('');
const bufferRef = useRef('');
const rafRef = useRef<number>(0);
const [displayText, setDisplayText] = useState('');
const onToken = useCallback((token: string) => {
accumulatedRef.current += token;
bufferRef.current += token;
if (!rafRef.current) {
rafRef.current = requestAnimationFrame(() => {
setDisplayText(accumulatedRef.current);
bufferRef.current = '';
rafRef.current = 0;
});
}
}, []);
const onComplete = useCallback(() => {
if (rafRef.current) {
cancelAnimationFrame(rafRef.current);
rafRef.current = 0;
}
setDisplayText(accumulatedRef.current);
}, []);
const reset = useCallback(() => {
if (rafRef.current) cancelAnimationFrame(rafRef.current);
accumulatedRef.current = '';
bufferRef.current = '';
rafRef.current = 0;
setDisplayText('');
}, []);
return { displayText, onToken, onComplete, reset };
}
Two refs, one state:
accumulatedRefholds the full text (always up to date, no render cost)bufferReftracks what's new since the last flush (for rAF batching)displayTextis the state that drives rendering (updated at most once per frame)
The onComplete handler is important — it cancels any pending rAF and forces a final synchronous update to ensure the last tokens aren't lost.
When to Consider flushSync
React 19's concurrent features batch state updates by default. Sometimes during streaming, you want an update to hit the DOM immediately — for example, when the user sends a new message and you want it to appear before the AI starts responding.
import { flushSync } from 'react-dom';
function sendMessage(text: string) {
flushSync(() => {
setMessages(prev => [...prev, { role: 'user', content: text }]);
});
startStreaming();
}
Use flushSync surgically — only for these critical "must appear now" moments. Using it on every token update defeats the purpose of batching and brings you right back to the performance problems we started with.
Never use flushSync inside a requestAnimationFrame callback. rAF already runs at the optimal time for visual updates. Wrapping a state update in flushSync inside rAF forces a synchronous re-render that can push the frame over the 16.6ms budget. Let React's normal batching handle updates inside rAF.
Strategy 5: Virtualized Message Lists
For long conversations — dozens or hundreds of messages — even memoized components have a cost. They exist in React's fiber tree, consuming memory and slowing reconciliation. Virtualization solves this by only rendering messages that are actually visible in the viewport.
How Virtualization Works
Instead of rendering all 500 messages, you render only the ~10-15 that fit in the viewport, plus a small overscan buffer above and below. As the user scrolls, messages outside the viewport are unmounted and new ones are mounted.
Viewport: [msg 45] [msg 46] [msg 47] [msg 48]
Overscan: [msg 43] [msg 44] ... [msg 49] [msg 50]
Not rendered: msg 1-42, msg 51-500
The key challenge with chat UIs: messages have variable heights. A code block message might be 400px tall. A short reply might be 32px. Traditional virtualization libraries need to know item heights upfront, which doesn't work for dynamic content.
Dynamic Height Measurement
The pattern for variable-height virtualization:
function VirtualizedMessages({
messages,
}: {
messages: Message[];
}) {
const heightCacheRef = useRef<Map<string, number>>(new Map());
const containerRef = useRef<HTMLDivElement>(null);
const [scrollTop, setScrollTop] = useState(0);
const [containerHeight, setContainerHeight] = useState(0);
const measureMessage = useCallback(
(id: string, element: HTMLElement | null) => {
if (!element) return;
const height = element.getBoundingClientRect().height;
heightCacheRef.current.set(id, height);
},
[],
);
const getEstimatedHeight = useCallback((msg: Message) => {
return heightCacheRef.current.get(msg.id) ?? 80;
}, []);
const visibleRange = useMemo(() => {
let accumulated = 0;
let startIdx = 0;
let endIdx = messages.length;
for (let i = 0; i < messages.length; i++) {
const h = getEstimatedHeight(messages[i]);
if (accumulated + h >= scrollTop) {
startIdx = Math.max(0, i - 3);
break;
}
accumulated += h;
}
accumulated = 0;
for (let i = 0; i < messages.length; i++) {
accumulated += getEstimatedHeight(messages[i]);
if (accumulated > scrollTop + containerHeight) {
endIdx = Math.min(messages.length, i + 4);
break;
}
}
return { startIdx, endIdx };
}, [messages, scrollTop, containerHeight, getEstimatedHeight]);
return (
<div
ref={containerRef}
onScroll={e => setScrollTop(e.currentTarget.scrollTop)}
style={{ overflow: 'auto', height: '100%' }}
>
<div style={{ height: getTotalHeight(messages, getEstimatedHeight) }}>
{messages.slice(visibleRange.startIdx, visibleRange.endIdx).map(msg => (
<VirtualMessage
key={msg.id}
message={msg}
onMeasure={measureMessage}
/>
))}
</div>
</div>
);
}
In production, you'd likely reach for a library like @tanstack/react-virtual or react-virtuoso that handles edge cases — ResizeObserver integration, smooth scrolling during height changes, and proper keyboard navigation. But understanding the underlying mechanism matters because you'll need to customize behavior for the streaming case.
Virtualization + Streaming
There's a tension between virtualization and streaming: the currently streaming message is constantly changing height. A naive virtualization setup would thrash — remeasuring and repositioning on every token.
The fix: exempt the streaming message from virtualization. Keep it as a regular DOM element pinned to the bottom. Only virtualize completed messages above it.
function ChatContainer({ messages }: { messages: Message[] }) {
const completedMessages = messages.filter(m => m.status !== 'streaming');
const streamingMessage = messages.find(m => m.status === 'streaming');
return (
<div className="chat-container">
<VirtualizedMessages messages={completedMessages} />
{streamingMessage && (
<StreamingMessage message={streamingMessage} />
)}
</div>
);
}
Strategy 6: The Blinking Cursor Effect
That blinking cursor at the end of a streaming message isn't just aesthetic — it's a critical UX signal that tells users "the AI is still thinking." Without it, a pause between tokens looks like the response ended.
CSS-Only Cursor
The simplest approach uses a pseudo-element:
.streaming-cursor::after {
content: '';
display: inline-block;
width: 2px;
height: 1.1em;
background: var(--color-accent);
margin-left: 2px;
vertical-align: text-bottom;
animation: blink 800ms steps(2) infinite;
}
@keyframes blink {
0%, 100% { opacity: 1; }
50% { opacity: 0; }
}
@media (prefers-reduced-motion: reduce) {
.streaming-cursor::after {
animation: none;
opacity: 1;
}
}
A few subtleties that separate a polished cursor from a janky one:
steps(2)instead of linear — creates a sharp on/off blink rather than a fade. This matches the behavior of a real text cursor.vertical-align: text-bottom— aligns the cursor with the text baseline, not the line box top.height: 1.1em— slightly taller than the text to match native cursor proportions.- Reduced motion — users who prefer reduced motion see a static cursor instead of a blinking one. Still communicates "streaming" without the animation.
Cursor Position During Append-Only Rendering
If you're using the raw DOM append strategy, the cursor needs special handling. You can't use ::after on a container that has child text nodes being appended — the pseudo-element position might not update correctly in all browsers.
Instead, use a dedicated cursor element that you reposition:
function StreamingContent() {
const containerRef = useRef<HTMLDivElement>(null);
const cursorRef = useRef<HTMLSpanElement>(null);
const appendTokens = useCallback((text: string) => {
const container = containerRef.current;
const cursor = cursorRef.current;
if (!container || !cursor) return;
const node = document.createTextNode(text);
container.insertBefore(node, cursor);
}, []);
return (
<div ref={containerRef}>
<span ref={cursorRef} className="streaming-cursor" />
</div>
);
}
By inserting text nodes before the cursor element rather than appending after it, the cursor naturally stays at the end of the content.
Putting It All Together
Here's how all these strategies compose into a production architecture:
The Performance Difference
| Approach | Renders/sec | Work per render | Total cost (1000 tokens) |
|---|---|---|---|
| Naive setState per token | 40 | O(n) — re-render all messages | O(n squared) |
| rAF batching | ~20 | O(n) — re-render all messages | O(n squared) but 2x fewer renders |
| rAF + component isolation | ~20 | O(k) — only streaming message | O(n * k) where k is constant |
| rAF + append-only DOM | ~20 | O(1) — append new text only | O(n) — linear |
The difference between O(n squared) and O(n) is the difference between a UI that locks up and one that's buttery smooth.
Why Not Web Workers for Token Processing?
You might think: "offload token processing to a Web Worker." But here's why that rarely helps for token rendering specifically:
The bottleneck isn't CPU-bound token processing — it's DOM updates, which can only happen on the main thread. Moving string concatenation to a worker saves microseconds but adds the overhead of postMessage serialization. The real wins come from reducing the frequency and scope of DOM work — which is what batching and append-only patterns achieve.
Where workers do help is in the markdown parsing step. If you're parsing markdown on every token (which you shouldn't — see the module on incremental markdown parsing), offloading that parse to a worker prevents it from blocking interaction. But the better fix is to not re-parse at all during streaming.
| What developers do | What they should do |
|---|---|
| Calling setState on every token to update the message content setState triggers a React render cycle. At 40 tokens/second, that is 40 renders per second, each re-diffing the entire component tree. Buffering in a ref costs nothing until the rAF flush, which batches all accumulated tokens into a single render. | Buffer tokens in a ref and flush with requestAnimationFrame |
| Re-rendering the entire message list when the streaming message updates Completed messages never change during streaming. Without memo, React re-renders every message in the list on every token update. With memo, only the streaming message component re-renders. | Use React.memo to isolate completed messages from the streaming message |
| Force-scrolling to the bottom on every token regardless of user position Users scroll up to read earlier messages. Force-scrolling yanks them back to the bottom on every token — 40 times per second. Check distance from bottom with a threshold and only scroll when the user was already at the end. | Track scroll position and only auto-scroll when the user is near the bottom |
| Re-parsing the entire accumulated markdown on every token Markdown parsing is expensive, especially with syntax highlighting. Parsing 4000 characters of markdown 40 times per second is catastrophic for performance. Streaming text raw and parsing once at the end reduces total parse work from O(n squared) to O(n). | Render raw text during streaming, parse markdown once when the stream completes |
- 1Buffer tokens in refs, flush to state with requestAnimationFrame — never setState per token
- 2Isolate the streaming message with React.memo so completed messages don't re-render
- 3Use append-only DOM updates during streaming — each token should be O(1) work
- 4Auto-scroll only when the user is near the bottom — use a passive scroll listener with a threshold
- 5Exempt the streaming message from virtualization — virtualize only completed messages
- 6Switch from raw text to rendered markdown only when the stream completes
- 7Always respect prefers-reduced-motion for the cursor animation