WebSocket Protocol and Implementation
HTTP Is a Monologue. WebSocket Is a Conversation.
HTTP was designed for documents: you ask, I respond, we're done. But real-time apps need something different — a persistent, bidirectional channel where either side can send data at any time without the overhead of a new TCP handshake per message.
That's WebSocket. One TCP connection, full-duplex communication, and a frame-based protocol that's surprisingly simple under the hood. Let's tear it apart.
The Opening Handshake
WebSocket doesn't replace HTTP — it hijacks it. The connection starts as a normal HTTP request, then "upgrades" to the WebSocket protocol. This is clever because it means WebSocket traffic passes through most proxies and firewalls that already allow HTTP.
GET /chat HTTP/1.1
Host: server.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
Origin: https://example.com
The server responds:
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
That Sec-WebSocket-Accept value? The server concatenates your Sec-WebSocket-Key with the magic string 258EAFA5-E914-47DA-95CA-5AB0F964CE65, takes the SHA-1 hash, and base64-encodes it. This isn't security — it's a handshake validation to prevent accidental HTTP connections from being misinterpreted as WebSocket.
Frame Anatomy
Once the handshake completes, HTTP is gone. Everything after is WebSocket frames — a lean binary format.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-------+-+-------------+-------------------------------+
|F|R|R|R| opcode|M| Payload len | Extended payload length |
|I|S|S|S| (4) |A| (7) | (16/64) |
|N|V|V|V| |S| | (if payload len==126/127) |
| |1|2|3| |K| | |
+-+-+-+-+-------+-+-------------+-------------------------------+
| Masking-key (0 or 4 bytes) | |
+-------------------------------------+ |
| Payload Data |
+---------------------------------------------------------------+
The key fields:
- FIN bit: Is this the final fragment? Most messages are a single frame (FIN=1). Large messages can be fragmented across multiple frames.
- Opcode: What type of frame —
0x1for text (UTF-8),0x2for binary,0x8for close,0x9for ping,0xAfor pong. - MASK bit: Client-to-server frames MUST be masked. Server-to-client frames MUST NOT be. This asymmetry exists to prevent cache poisoning attacks on intermediary proxies.
- Payload length: 7 bits for payloads up to 125 bytes, 16 bits for up to 65,535 bytes, 64 bits for larger.
Think of a WebSocket frame like a letter in an envelope. The first 2 bytes are the envelope label (type, size hint, masking). Then optionally more size info, optionally a masking key, and finally the letter itself (payload). The overhead is just 2-14 bytes per frame — compared to HTTP headers that can be hundreds of bytes per request.
Binary vs Text Frames
WebSocket supports two data types: text (opcode 0x1) and binary (opcode 0x2).
const ws = new WebSocket('wss://api.example.com/stream');
ws.addEventListener('open', () => {
ws.send('Hello');
const buffer = new ArrayBuffer(4);
const view = new DataView(buffer);
view.setFloat32(0, 3.14);
ws.send(buffer);
});
ws.binaryType = 'arraybuffer';
ws.addEventListener('message', (event) => {
if (typeof event.data === 'string') {
const parsed = JSON.parse(event.data);
} else {
const view = new DataView(event.data);
const value = view.getFloat32(0);
}
});
Text frames are validated as UTF-8 by the protocol. Binary frames are raw bytes — you decide the encoding. For structured data, JSON over text frames is the default choice. For high-throughput scenarios (gaming, audio streaming, large datasets), binary with a schema (Protocol Buffers, MessagePack, or FlatBuffers) cuts serialization overhead dramatically.
Heartbeats: Keeping the Connection Alive
TCP connections can go silent without either side knowing the other is gone. NAT routers and load balancers drop idle connections (typically after 30-60 seconds). The WebSocket protocol includes ping/pong frames for this:
function createHeartbeat(ws: WebSocket, intervalMs = 30000) {
let missedPongs = 0;
let pingTimer: ReturnType<typeof setInterval>;
let pongTimer: ReturnType<typeof setTimeout>;
const startPing = () => {
pingTimer = setInterval(() => {
if (ws.readyState !== WebSocket.OPEN) {
clearInterval(pingTimer);
return;
}
missedPongs++;
if (missedPongs >= 3) {
ws.close(4000, 'Heartbeat timeout');
clearInterval(pingTimer);
return;
}
ws.send(JSON.stringify({ type: '__ping', ts: Date.now() }));
pongTimer = setTimeout(() => {
missedPongs++;
}, 5000);
}, intervalMs);
};
const handlePong = () => {
missedPongs = 0;
clearTimeout(pongTimer);
};
return { startPing, handlePong, stop: () => clearInterval(pingTimer) };
}
The browser WebSocket API does NOT expose ping/pong frames — the browser handles them automatically at the protocol level. For application-level heartbeats, you need to implement your own ping/pong over regular data frames, as shown above. The server side (Node.js ws library) does give you access to protocol-level ping/pong.
Building a Production WebSocket Client
A real-world WebSocket client needs: automatic reconnection, message buffering during disconnects, typed message handling, and connection state management. Here's a production-grade implementation:
type ConnectionState = 'connecting' | 'connected' | 'reconnecting' | 'disconnected';
type MessageHandler = (data: unknown) => void;
interface WSClientOptions {
url: string;
protocols?: string[];
maxReconnectAttempts?: number;
baseReconnectDelay?: number;
heartbeatInterval?: number;
}
class WSClient {
private ws: WebSocket | null = null;
private state: ConnectionState = 'disconnected';
private reconnectAttempts = 0;
private messageQueue: string[] = [];
private handlers = new Map<string, Set<MessageHandler>>();
private reconnectTimer: ReturnType<typeof setTimeout> | null = null;
private heartbeatTimer: ReturnType<typeof setInterval> | null = null;
constructor(private opts: WSClientOptions) {}
connect(): void {
if (this.ws?.readyState === WebSocket.OPEN) return;
this.state = this.reconnectAttempts > 0 ? 'reconnecting' : 'connecting';
this.ws = new WebSocket(this.opts.url, this.opts.protocols);
this.ws.addEventListener('open', () => {
this.state = 'connected';
this.reconnectAttempts = 0;
this.flushQueue();
this.startHeartbeat();
});
this.ws.addEventListener('message', (event) => {
const msg = JSON.parse(event.data);
if (msg.type === '__pong') return;
const handlers = this.handlers.get(msg.type);
if (handlers) {
handlers.forEach((fn) => fn(msg.payload));
}
});
this.ws.addEventListener('close', (event) => {
this.stopHeartbeat();
if (event.code !== 1000) {
this.scheduleReconnect();
} else {
this.state = 'disconnected';
}
});
this.ws.addEventListener('error', () => {
this.ws?.close();
});
}
send(type: string, payload: unknown): void {
const msg = JSON.stringify({ type, payload });
if (this.ws?.readyState === WebSocket.OPEN) {
this.ws.send(msg);
} else {
this.messageQueue.push(msg);
}
}
on(type: string, handler: MessageHandler): () => void {
if (!this.handlers.has(type)) {
this.handlers.set(type, new Set());
}
this.handlers.get(type)!.add(handler);
return () => this.handlers.get(type)?.delete(handler);
}
private flushQueue(): void {
while (this.messageQueue.length > 0) {
const msg = this.messageQueue.shift()!;
this.ws?.send(msg);
}
}
private scheduleReconnect(): void {
const max = this.opts.maxReconnectAttempts ?? 10;
if (this.reconnectAttempts >= max) {
this.state = 'disconnected';
return;
}
const base = this.opts.baseReconnectDelay ?? 1000;
const delay = Math.min(base * Math.pow(2, this.reconnectAttempts), 30000);
const jitter = delay * (0.5 + Math.random() * 0.5);
this.reconnectAttempts++;
this.state = 'reconnecting';
this.reconnectTimer = setTimeout(() => this.connect(), jitter);
}
private startHeartbeat(): void {
const interval = this.opts.heartbeatInterval ?? 30000;
this.heartbeatTimer = setInterval(() => {
if (this.ws?.readyState === WebSocket.OPEN) {
this.ws.send(JSON.stringify({ type: '__ping', ts: Date.now() }));
}
}, interval);
}
private stopHeartbeat(): void {
if (this.heartbeatTimer) {
clearInterval(this.heartbeatTimer);
this.heartbeatTimer = null;
}
}
disconnect(): void {
if (this.reconnectTimer) clearTimeout(this.reconnectTimer);
this.stopHeartbeat();
this.ws?.close(1000, 'Client disconnect');
this.state = 'disconnected';
}
}
Security: WSS, Origin Validation, Authentication
- 1Always use WSS (WebSocket Secure) in production — WS traffic is visible to any intermediary
- 2Validate the Origin header server-side to prevent cross-site WebSocket hijacking
- 3Authenticate via a short-lived token in the URL or first message — cookies alone are vulnerable to CSWSH
- 4Set message size limits server-side to prevent memory exhaustion attacks
- 5Rate-limit messages per client to prevent abuse
The most common authentication pattern:
const token = await fetchShortLivedToken();
const ws = new WebSocket(`wss://api.example.com/ws?token=${token}`);
Why not cookies? Because WebSocket connections from any origin will include your cookies automatically. An attacker's page at evil.com could open a WebSocket to your server and the browser would happily send your session cookie along with it. This is Cross-Site WebSocket Hijacking (CSWSH).
Never put long-lived secrets in WebSocket URLs. The URL appears in server access logs, browser history, and potentially intermediary proxy logs. Use a short-lived, single-use token that expires within seconds.
WebSocket vs WebTransport
WebSocket runs over TCP. That means head-of-line blocking: if one packet is lost, everything behind it waits for the retransmission. For many real-time scenarios this is acceptable. But for applications that send multiple independent streams (multiplayer games, video conferencing), this becomes a bottleneck.
WebTransport is the next generation — it runs over HTTP/3 (QUIC), which means:
- Multiple independent streams — packet loss in one stream doesn't block others
- Unreliable datagrams — fire-and-forget messages (perfect for position updates in games)
- Built-in multiplexing — no need for application-level channel management
- Native congestion control — QUIC handles this at the transport layer
| Feature | WebSocket | WebTransport |
|---|---|---|
| Transport | TCP | QUIC (UDP-based) |
| Multiplexing | Single stream | Multiple independent streams |
| Unreliable mode | No | Yes (datagrams) |
| Head-of-line blocking | Yes | No (per-stream) |
| Browser support | Universal | Chrome, Edge, Firefox (growing) |
| Maturity | Stable since 2011 | Emerging standard |
When to Skip Socket.io
Socket.io adds automatic reconnection, room management, event namespaces, and fallback to long-polling. It's a solid abstraction if you need broad compatibility and don't want to build the infrastructure we covered above.
But it comes with costs: a custom protocol on top of WebSocket (meaning you can't connect with a plain WebSocket client), larger bundle size (~45KB min+gzipped for the client), and behavior that can be surprising (automatic reconnection with defaults you might not want).
| What developers do | What they should do |
|---|---|
| Using Socket.io because WebSocket is too low-level Socket.io's abstraction layer adds complexity and bundle size. For most apps, a focused wrapper over the native WebSocket API gives you exactly what you need without the overhead. | Build a thin wrapper (like the WSClient above) when you only need basic features |
| Sending JSON for every message type without a schema Type-safe message contracts catch protocol mismatches at compile time. Without them, a typo in a message type silently fails at runtime. | Define a typed message protocol with discriminated unions |
| Relying on WebSocket close events to detect disconnection TCP can take minutes to detect a dead peer. NAT routers silently drop idle connections. Only application-level ping/pong gives you timely detection. | Use application-level heartbeats to detect silent connection failures |
System Design: Real-Time Notification System
Design a notification system that delivers messages to 1M concurrent users. Cover: connection management, fan-out strategy (per-connection vs pub/sub), graceful degradation when WebSocket isn't available, message ordering guarantees, and what happens when a user reconnects after being offline for 2 hours. Discuss the tradeoffs between WebSocket and SSE for this specific use case.