Skip to content

WebSocket Protocol and Implementation

advanced18 min read

HTTP Is a Monologue. WebSocket Is a Conversation.

HTTP was designed for documents: you ask, I respond, we're done. But real-time apps need something different — a persistent, bidirectional channel where either side can send data at any time without the overhead of a new TCP handshake per message.

That's WebSocket. One TCP connection, full-duplex communication, and a frame-based protocol that's surprisingly simple under the hood. Let's tear it apart.

The Opening Handshake

WebSocket doesn't replace HTTP — it hijacks it. The connection starts as a normal HTTP request, then "upgrades" to the WebSocket protocol. This is clever because it means WebSocket traffic passes through most proxies and firewalls that already allow HTTP.

GET /chat HTTP/1.1
Host: server.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
Origin: https://example.com

The server responds:

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

That Sec-WebSocket-Accept value? The server concatenates your Sec-WebSocket-Key with the magic string 258EAFA5-E914-47DA-95CA-5AB0F964CE65, takes the SHA-1 hash, and base64-encodes it. This isn't security — it's a handshake validation to prevent accidental HTTP connections from being misinterpreted as WebSocket.

Quiz
Why does the WebSocket handshake start with an HTTP request?

Frame Anatomy

Once the handshake completes, HTTP is gone. Everything after is WebSocket frames — a lean binary format.

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-------+-+-------------+-------------------------------+
|F|R|R|R| opcode|M| Payload len |    Extended payload length    |
|I|S|S|S|  (4)  |A|     (7)     |             (16/64)           |
|N|V|V|V|       |S|             |   (if payload len==126/127)   |
| |1|2|3|       |K|             |                               |
+-+-+-+-+-------+-+-------------+-------------------------------+
|     Masking-key (0 or 4 bytes)      |                         |
+-------------------------------------+                         |
|                    Payload Data                                |
+---------------------------------------------------------------+

The key fields:

  • FIN bit: Is this the final fragment? Most messages are a single frame (FIN=1). Large messages can be fragmented across multiple frames.
  • Opcode: What type of frame — 0x1 for text (UTF-8), 0x2 for binary, 0x8 for close, 0x9 for ping, 0xA for pong.
  • MASK bit: Client-to-server frames MUST be masked. Server-to-client frames MUST NOT be. This asymmetry exists to prevent cache poisoning attacks on intermediary proxies.
  • Payload length: 7 bits for payloads up to 125 bytes, 16 bits for up to 65,535 bytes, 64 bits for larger.
Mental Model

Think of a WebSocket frame like a letter in an envelope. The first 2 bytes are the envelope label (type, size hint, masking). Then optionally more size info, optionally a masking key, and finally the letter itself (payload). The overhead is just 2-14 bytes per frame — compared to HTTP headers that can be hundreds of bytes per request.

Binary vs Text Frames

WebSocket supports two data types: text (opcode 0x1) and binary (opcode 0x2).

const ws = new WebSocket('wss://api.example.com/stream');

ws.addEventListener('open', () => {
  ws.send('Hello');

  const buffer = new ArrayBuffer(4);
  const view = new DataView(buffer);
  view.setFloat32(0, 3.14);
  ws.send(buffer);
});

ws.binaryType = 'arraybuffer';

ws.addEventListener('message', (event) => {
  if (typeof event.data === 'string') {
    const parsed = JSON.parse(event.data);
  } else {
    const view = new DataView(event.data);
    const value = view.getFloat32(0);
  }
});

Text frames are validated as UTF-8 by the protocol. Binary frames are raw bytes — you decide the encoding. For structured data, JSON over text frames is the default choice. For high-throughput scenarios (gaming, audio streaming, large datasets), binary with a schema (Protocol Buffers, MessagePack, or FlatBuffers) cuts serialization overhead dramatically.

Quiz
Why must client-to-server WebSocket frames be masked?

Heartbeats: Keeping the Connection Alive

TCP connections can go silent without either side knowing the other is gone. NAT routers and load balancers drop idle connections (typically after 30-60 seconds). The WebSocket protocol includes ping/pong frames for this:

function createHeartbeat(ws: WebSocket, intervalMs = 30000) {
  let missedPongs = 0;
  let pingTimer: ReturnType<typeof setInterval>;
  let pongTimer: ReturnType<typeof setTimeout>;

  const startPing = () => {
    pingTimer = setInterval(() => {
      if (ws.readyState !== WebSocket.OPEN) {
        clearInterval(pingTimer);
        return;
      }
      missedPongs++;
      if (missedPongs >= 3) {
        ws.close(4000, 'Heartbeat timeout');
        clearInterval(pingTimer);
        return;
      }
      ws.send(JSON.stringify({ type: '__ping', ts: Date.now() }));
      pongTimer = setTimeout(() => {
        missedPongs++;
      }, 5000);
    }, intervalMs);
  };

  const handlePong = () => {
    missedPongs = 0;
    clearTimeout(pongTimer);
  };

  return { startPing, handlePong, stop: () => clearInterval(pingTimer) };
}
Info

The browser WebSocket API does NOT expose ping/pong frames — the browser handles them automatically at the protocol level. For application-level heartbeats, you need to implement your own ping/pong over regular data frames, as shown above. The server side (Node.js ws library) does give you access to protocol-level ping/pong.

Building a Production WebSocket Client

A real-world WebSocket client needs: automatic reconnection, message buffering during disconnects, typed message handling, and connection state management. Here's a production-grade implementation:

type ConnectionState = 'connecting' | 'connected' | 'reconnecting' | 'disconnected';
type MessageHandler = (data: unknown) => void;

interface WSClientOptions {
  url: string;
  protocols?: string[];
  maxReconnectAttempts?: number;
  baseReconnectDelay?: number;
  heartbeatInterval?: number;
}

class WSClient {
  private ws: WebSocket | null = null;
  private state: ConnectionState = 'disconnected';
  private reconnectAttempts = 0;
  private messageQueue: string[] = [];
  private handlers = new Map<string, Set<MessageHandler>>();
  private reconnectTimer: ReturnType<typeof setTimeout> | null = null;
  private heartbeatTimer: ReturnType<typeof setInterval> | null = null;

  constructor(private opts: WSClientOptions) {}

  connect(): void {
    if (this.ws?.readyState === WebSocket.OPEN) return;

    this.state = this.reconnectAttempts > 0 ? 'reconnecting' : 'connecting';
    this.ws = new WebSocket(this.opts.url, this.opts.protocols);

    this.ws.addEventListener('open', () => {
      this.state = 'connected';
      this.reconnectAttempts = 0;
      this.flushQueue();
      this.startHeartbeat();
    });

    this.ws.addEventListener('message', (event) => {
      const msg = JSON.parse(event.data);
      if (msg.type === '__pong') return;
      const handlers = this.handlers.get(msg.type);
      if (handlers) {
        handlers.forEach((fn) => fn(msg.payload));
      }
    });

    this.ws.addEventListener('close', (event) => {
      this.stopHeartbeat();
      if (event.code !== 1000) {
        this.scheduleReconnect();
      } else {
        this.state = 'disconnected';
      }
    });

    this.ws.addEventListener('error', () => {
      this.ws?.close();
    });
  }

  send(type: string, payload: unknown): void {
    const msg = JSON.stringify({ type, payload });
    if (this.ws?.readyState === WebSocket.OPEN) {
      this.ws.send(msg);
    } else {
      this.messageQueue.push(msg);
    }
  }

  on(type: string, handler: MessageHandler): () => void {
    if (!this.handlers.has(type)) {
      this.handlers.set(type, new Set());
    }
    this.handlers.get(type)!.add(handler);
    return () => this.handlers.get(type)?.delete(handler);
  }

  private flushQueue(): void {
    while (this.messageQueue.length > 0) {
      const msg = this.messageQueue.shift()!;
      this.ws?.send(msg);
    }
  }

  private scheduleReconnect(): void {
    const max = this.opts.maxReconnectAttempts ?? 10;
    if (this.reconnectAttempts >= max) {
      this.state = 'disconnected';
      return;
    }
    const base = this.opts.baseReconnectDelay ?? 1000;
    const delay = Math.min(base * Math.pow(2, this.reconnectAttempts), 30000);
    const jitter = delay * (0.5 + Math.random() * 0.5);
    this.reconnectAttempts++;
    this.state = 'reconnecting';
    this.reconnectTimer = setTimeout(() => this.connect(), jitter);
  }

  private startHeartbeat(): void {
    const interval = this.opts.heartbeatInterval ?? 30000;
    this.heartbeatTimer = setInterval(() => {
      if (this.ws?.readyState === WebSocket.OPEN) {
        this.ws.send(JSON.stringify({ type: '__ping', ts: Date.now() }));
      }
    }, interval);
  }

  private stopHeartbeat(): void {
    if (this.heartbeatTimer) {
      clearInterval(this.heartbeatTimer);
      this.heartbeatTimer = null;
    }
  }

  disconnect(): void {
    if (this.reconnectTimer) clearTimeout(this.reconnectTimer);
    this.stopHeartbeat();
    this.ws?.close(1000, 'Client disconnect');
    this.state = 'disconnected';
  }
}
Quiz
Why does the reconnection use jitter added to exponential backoff?

Security: WSS, Origin Validation, Authentication

Key Rules
  1. 1Always use WSS (WebSocket Secure) in production — WS traffic is visible to any intermediary
  2. 2Validate the Origin header server-side to prevent cross-site WebSocket hijacking
  3. 3Authenticate via a short-lived token in the URL or first message — cookies alone are vulnerable to CSWSH
  4. 4Set message size limits server-side to prevent memory exhaustion attacks
  5. 5Rate-limit messages per client to prevent abuse

The most common authentication pattern:

const token = await fetchShortLivedToken();
const ws = new WebSocket(`wss://api.example.com/ws?token=${token}`);

Why not cookies? Because WebSocket connections from any origin will include your cookies automatically. An attacker's page at evil.com could open a WebSocket to your server and the browser would happily send your session cookie along with it. This is Cross-Site WebSocket Hijacking (CSWSH).

Common Trap

Never put long-lived secrets in WebSocket URLs. The URL appears in server access logs, browser history, and potentially intermediary proxy logs. Use a short-lived, single-use token that expires within seconds.

WebSocket vs WebTransport

WebSocket runs over TCP. That means head-of-line blocking: if one packet is lost, everything behind it waits for the retransmission. For many real-time scenarios this is acceptable. But for applications that send multiple independent streams (multiplayer games, video conferencing), this becomes a bottleneck.

WebTransport is the next generation — it runs over HTTP/3 (QUIC), which means:

  • Multiple independent streams — packet loss in one stream doesn't block others
  • Unreliable datagrams — fire-and-forget messages (perfect for position updates in games)
  • Built-in multiplexing — no need for application-level channel management
  • Native congestion control — QUIC handles this at the transport layer
FeatureWebSocketWebTransport
TransportTCPQUIC (UDP-based)
MultiplexingSingle streamMultiple independent streams
Unreliable modeNoYes (datagrams)
Head-of-line blockingYesNo (per-stream)
Browser supportUniversalChrome, Edge, Firefox (growing)
MaturityStable since 2011Emerging standard
Quiz
What is the main advantage of WebTransport over WebSocket for a multiplayer game?

When to Skip Socket.io

Socket.io adds automatic reconnection, room management, event namespaces, and fallback to long-polling. It's a solid abstraction if you need broad compatibility and don't want to build the infrastructure we covered above.

But it comes with costs: a custom protocol on top of WebSocket (meaning you can't connect with a plain WebSocket client), larger bundle size (~45KB min+gzipped for the client), and behavior that can be surprising (automatic reconnection with defaults you might not want).

What developers doWhat they should do
Using Socket.io because WebSocket is too low-level
Socket.io's abstraction layer adds complexity and bundle size. For most apps, a focused wrapper over the native WebSocket API gives you exactly what you need without the overhead.
Build a thin wrapper (like the WSClient above) when you only need basic features
Sending JSON for every message type without a schema
Type-safe message contracts catch protocol mismatches at compile time. Without them, a typo in a message type silently fails at runtime.
Define a typed message protocol with discriminated unions
Relying on WebSocket close events to detect disconnection
TCP can take minutes to detect a dead peer. NAT routers silently drop idle connections. Only application-level ping/pong gives you timely detection.
Use application-level heartbeats to detect silent connection failures
Interview Question

System Design: Real-Time Notification System

Design a notification system that delivers messages to 1M concurrent users. Cover: connection management, fan-out strategy (per-connection vs pub/sub), graceful degradation when WebSocket isn't available, message ordering guarantees, and what happens when a user reconnects after being offline for 2 hours. Discuss the tradeoffs between WebSocket and SSE for this specific use case.

1/10