Streaming AI Responses with SSE
The 3-Second Rule That Changed AI UIs
Try this experiment: open ChatGPT and ask it something complex. Now imagine if instead of seeing tokens appear word-by-word, you stared at a spinner for 8 seconds and then the entire response popped in at once. Same content, same wait time, same answer. But the experience is completely different.
Streaming isn't about speed. The model takes the same time to generate the full response either way. Streaming is about perceived latency. When the first token appears in 200ms instead of 8 seconds, the user's brain switches from "is this broken?" to "it's thinking, and I can already start reading." That shift is the difference between an AI product people love and one they abandon.
Every major AI product — ChatGPT, Claude, Gemini, Copilot — streams responses. Not because it's trendy, but because the psychology of waiting demands it. And the protocol powering all of them? Server-Sent Events.
Think of SSE like a news ticker on a TV screen. You don't wait for the entire day's news to be compiled before the ticker starts scrolling. As soon as the newsroom has one headline ready, it pushes it to the screen. Viewers start reading immediately while new headlines keep arriving. The connection stays open, the data flows one direction (server to client), and each item is a self-contained event. That's SSE — a persistent one-way channel where the server pushes events as they become available.
What SSE Actually Is
Server-Sent Events is a dead-simple protocol built on top of HTTP. The server responds with Content-Type: text/event-stream and keeps the connection open, sending structured text events over time.
Here's the raw wire format:
event: message_start
data: {"type":"message_start","message":{"id":"msg_01X","role":"assistant"}}
event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":"Hello"}}
event: content_block_delta
data: {"type":"content_block_delta","delta":{"type":"text_delta","text":" world"}}
event: message_stop
data: {"type":"message_stop"}
Each event has a clear structure:
event:— the event type (optional, defaults to"message")data:— the payload (can span multiple lines withdata:prefix on each)id:— a unique event ID for reconnection (optional)retry:— reconnection interval in milliseconds (optional)- Events are separated by a blank line (two newlines:
\n\n)
That's the entire protocol. No binary framing, no handshake negotiation, no magic bytes. Just structured text over a long-lived HTTP response.
SSE vs WebSocket vs Long Polling
Before we go deeper, let's settle when to use what:
| Feature | SSE | WebSocket | Long Polling |
|---|---|---|---|
| Direction | Server → Client only | Bidirectional | Server → Client only |
| Protocol | HTTP/1.1 or HTTP/2 | WS (upgrade from HTTP) | HTTP (repeated requests) |
| Auto reconnect | Built-in | Manual | Manual |
| Binary data | Text only | Text + Binary | Either |
| Auth headers | Via EventSource: No custom headers | Via handshake only | Per request |
| Best for | LLM streaming, live feeds | Chat, gaming, real-time collab | Legacy fallback |
For LLM streaming, SSE wins. The data flows one direction (server to client), it's always text (JSON events), and you don't need bidirectional communication for token streaming. WebSocket is overkill — you'd be establishing a persistent bidirectional channel just to read from it.
The EventSource API (and Why You Won't Use It)
The browser ships with a built-in EventSource API for SSE:
const source = new EventSource('/api/stream');
source.addEventListener('message', (event) => {
const data = JSON.parse(event.data);
console.log(data);
});
source.addEventListener('error', (event) => {
console.error('Connection lost, reconnecting...');
});
Clean, simple, automatic reconnection. Sounds perfect. So why does every AI product ignore it?
Three deal-breaking limitations:
- GET only —
EventSourcecan only make GET requests. LLM APIs require POST with a JSON body containing the messages, model, temperature, etc. - No custom headers — you can't set
Authorization: Bearer sk-...or any custom headers. LLM APIs always require authentication headers. - No request body — even if you could POST, there's no way to send a request body.
These aren't edge cases — they're fundamental requirements for any AI API. The EventSource API was designed for simple server-push scenarios like stock tickers or notification feeds. LLM streaming needs something more flexible.
fetch + ReadableStream: The Real Pattern
Here's what production AI apps actually use — fetch with ReadableStream:
async function streamChat(messages) {
const response = await fetch('https://api.anthropic.com/v1/messages', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-api-key': API_KEY,
'anthropic-version': '2023-06-01',
},
body: JSON.stringify({
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
stream: true,
messages,
}),
});
if (!response.ok) {
throw new Error(`API error: ${response.status}`);
}
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split('\n');
buffer = lines.pop() ?? '';
for (const line of lines) {
if (line.startsWith('data: ')) {
const json = line.slice(6);
if (json === '[DONE]') return;
const event = JSON.parse(json);
if (event.type === 'content_block_delta') {
process.stdout.write(event.delta.text);
}
}
}
}
}
Let's break down why every piece matters:
response.body.getReader()— gives you aReadableStreamDefaultReaderthat reads chunks as they arrive, not after the full response downloadsTextDecoderwith{ stream: true }— handles multi-byte UTF-8 characters that might be split across chunks- Line buffer — SSE events are line-delimited, but network chunks don't respect line boundaries. A chunk might end mid-line, so you keep the incomplete last line in a buffer
lines.pop()— the last element after splitting might be an incomplete line, so you save it for the next chunk
Never use decoder.decode(value) without { stream: true } when processing a stream. Without it, the decoder treats each chunk as a complete message, which corrupts multi-byte characters (like emoji or non-ASCII text) that get split across chunk boundaries. You'll see garbled output intermittently — the kind of bug that passes every test but breaks in production with real user input.
SSE Event Formats From Major Providers
Each AI provider structures their SSE events differently. Understanding these formats is essential for building provider-agnostic streaming UIs.
Anthropic's Event Protocol
Anthropic uses typed events with a clear lifecycle:
event: message_start
data: {"type":"message_start","message":{"id":"msg_01X","model":"claude-sonnet-4-20250514","role":"assistant","usage":{"input_tokens":25}}}
event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" there!"}}
event: content_block_stop
data: {"type":"content_block_stop","index":0}
event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":12}}
event: message_stop
data: {"type":"message_stop"}
The lifecycle is explicit: message_start → content_block_start → deltas → content_block_stop → message_delta → message_stop. Each content block has an index, which matters when the model returns multiple blocks (text + tool use).
Anthropic also sends ping events as keep-alives and typed delta variants: text_delta for text, input_json_delta for tool call arguments, and thinking_delta for extended thinking content.
OpenAI's Event Format
OpenAI uses a simpler format with a single event type:
data: {"id":"chatcmpl-abc","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello"}}]}
data: {"id":"chatcmpl-abc","choices":[{"index":0,"delta":{"content":" there!"}}]}
data: {"id":"chatcmpl-abc","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]
No event: field — every event is an unnamed data: line. The stream terminates with the literal string data: [DONE]. The delta object progressively adds content, and the final chunk includes a finish_reason.
Vercel AI SDK Data Stream Protocol
The Vercel AI SDK uses a prefix-based protocol where each line starts with a type identifier:
f:{"messageId":"msg-1"}
0:"Hello"
0:" there"
0:"!"
d:{"finishReason":"stop","usage":{"promptTokens":10,"completionTokens":5}}
Prefixes map to types: 0 for text deltas, f for start, d for done/finish, 9 for tool calls, g for reasoning, and more. This protocol is optimized for the AI SDK's React hooks.
Building a Production SSE Consumer
Let's build a proper SSE parser that handles the real-world edge cases:
type SSEEvent = {
event: string;
data: string;
id?: string;
retry?: number;
};
function parseSSEEvents(chunk: string): {
events: SSEEvent[];
remaining: string;
} {
const events: SSEEvent[] = [];
const blocks = chunk.split('\n\n');
const remaining = blocks.pop() ?? '';
for (const block of blocks) {
if (!block.trim()) continue;
let event = 'message';
let data = '';
let id: string | undefined;
let retry: number | undefined;
for (const line of block.split('\n')) {
if (line.startsWith('event: ')) {
event = line.slice(7);
} else if (line.startsWith('data: ')) {
data += (data ? '\n' : '') + line.slice(6);
} else if (line.startsWith('id: ')) {
id = line.slice(4);
} else if (line.startsWith('retry: ')) {
retry = parseInt(line.slice(7), 10);
}
}
if (data) {
events.push({ event, data, id, retry });
}
}
return { events, remaining };
}
Notice how events are split by double newlines (\n\n), but the last block might be incomplete — so we save it as remaining for the next chunk.
Now wire it up with fetch:
async function* streamSSE(
url: string,
options: RequestInit
): AsyncGenerator<SSEEvent> {
const response = await fetch(url, options);
if (!response.ok) {
const body = await response.text();
throw new Error(`SSE request failed (${response.status}): ${body}`);
}
const reader = response.body!.getReader();
const decoder = new TextDecoder();
let buffer = '';
try {
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const { events, remaining } = parseSSEEvents(buffer);
buffer = remaining;
for (const event of events) {
yield event;
}
}
if (buffer.trim()) {
const { events } = parseSSEEvents(buffer + '\n\n');
for (const event of events) {
yield event;
}
}
} finally {
reader.releaseLock();
}
}
Using an async generator here is the elegant move. The caller gets a clean for await...of loop:
for await (const event of streamSSE('/api/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ messages, model: 'claude-sonnet-4-20250514' }),
})) {
if (event.event === 'content_block_delta') {
const delta = JSON.parse(event.data);
if (delta.delta.type === 'text_delta') {
appendToUI(delta.delta.text);
}
}
}
Why async generators are perfect for SSE
An async generator (async function*) lets you yield values asynchronously. The consumer pulls events one at a time with for await...of, which naturally applies backpressure — if the consumer is slow to process events, the generator pauses at the yield until the consumer is ready for the next one. This is fundamentally different from a callback-based approach where events fire whether the consumer is ready or not. For SSE consumers, this means you never buffer unbounded events in memory — each event is processed before the next one is pulled.
Error Handling and Reconnection
SSE connections drop. Networks are unreliable. Here's how to handle it:
async function streamWithRetry(
url: string,
options: RequestInit,
onEvent: (event: SSEEvent) => void,
maxRetries = 3
) {
let retries = 0;
let lastEventId: string | undefined;
while (retries <= maxRetries) {
try {
const headers = new Headers(options.headers);
if (lastEventId) {
headers.set('Last-Event-ID', lastEventId);
}
for await (const event of streamSSE(url, {
...options,
headers,
})) {
retries = 0;
if (event.id) lastEventId = event.id;
onEvent(event);
}
return;
} catch (error) {
retries++;
if (retries > maxRetries) {
throw new Error(
`Stream failed after ${maxRetries} retries: ${error}`
);
}
const delay = Math.min(1000 * 2 ** (retries - 1), 30000);
await new Promise((resolve) => setTimeout(resolve, delay));
}
}
}
Key patterns here:
- Exponential backoff — wait 1s, 2s, 4s, 8s... up to 30s between retries
Last-Event-ID— the SSE spec defines this header for reconnection. If the server assigns IDs to events, you can resume from where you left off (though most LLM APIs don't support this)- Reset retry count on success — if we receive events, the connection is healthy
Cancelling a Stream
Users change their mind. They click "Stop generating." Your code needs to handle this gracefully:
const controller = new AbortController();
const streamPromise = streamSSE('/api/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ messages }),
signal: controller.signal,
});
stopButton.addEventListener('click', () => {
controller.abort();
});
When abort() is called, the fetch promise rejects with an AbortError. The ReadableStream is cancelled, and the underlying TCP connection is closed. The server stops generating tokens (most AI APIs detect client disconnection).
Here's the thing most people miss: you need to handle the AbortError differently from real errors. An abort isn't a failure — it's an intentional user action:
try {
for await (const event of streamSSE(url, { signal })) {
handleEvent(event);
}
} catch (error) {
if (error instanceof DOMException && error.name === 'AbortError') {
return;
}
throw error;
}
Common Mistakes
| What developers do | What they should do |
|---|---|
| Using EventSource for LLM API calls EventSource only supports GET requests with no custom headers. Every LLM API requires POST with auth headers and a JSON body. | Use fetch + ReadableStream for full control over method, headers, and body |
| Calling TextDecoder.decode() without { stream: true } Without the stream flag, multi-byte UTF-8 characters split across chunks get corrupted. This causes intermittent garbled text with emoji and non-ASCII content. | Always pass { stream: true } when decoding streaming chunks |
| Splitting chunks on newlines without buffering incomplete lines Network chunks don't align with SSE event boundaries. A chunk can end in the middle of a data: line, and processing it as complete produces parse errors. | Keep a buffer and save the last incomplete line for the next chunk |
| Not handling AbortError separately from real errors When users click 'Stop generating', the stream throws an AbortError. Showing an error message for an intentional action is a broken UX. | Check for AbortError and treat it as a clean cancellation, not a failure |
Key Rules
- 1SSE is text/event-stream over HTTP — events separated by blank lines, fields prefixed with event:, data:, id:, retry:
- 2Use fetch + ReadableStream for LLM streaming — EventSource is GET-only with no custom headers
- 3Always decode with TextDecoder({ stream: true }) to handle split multi-byte characters
- 4Buffer incomplete lines between chunks — network boundaries don't respect event boundaries
- 5Use AbortController for cancellation and handle AbortError as a clean exit, not a failure
- 6Implement exponential backoff for reconnection — never hammer a failing endpoint
What's Next
You now understand the protocol layer — how SSE works, why EventSource falls short, and how to build a robust fetch-based consumer. But we're reading raw chunks and splitting strings. In the next topic, we'll dive into the ReadableStream API itself: TransformStream for parsing pipelines, TextDecoderStream for zero-copy decoding, and how to compose stream transforms that turn raw bytes into structured events with clean separation of concerns.