CRDTs: Theory and Types

advanced20 min read

What Makes a CRDT

A Conflict-free Replicated Data Type is a data structure designed so that replicas can be updated independently (without coordination) and are guaranteed to converge to the same state when they've seen the same set of updates — regardless of the order those updates are applied.

No central server. No locking. No consensus protocol. Just math.

The guarantee is called Strong Eventual Consistency (SEC): if two replicas have received the same set of updates (in any order), they are in the same state. This is stronger than regular eventual consistency, which only promises they'll converge "eventually" with no concrete bound.

The Three Properties

A CRDT merge function must satisfy three algebraic properties:

Commutativity: merge(a, b) = merge(b, a) Applying A then B gives the same result as B then A. Order doesn't matter.

Associativity: merge(merge(a, b), c) = merge(a, merge(b, c)) Grouping doesn't matter. You can merge updates in any batches.

Idempotence: merge(a, a) = a Applying the same update twice is the same as applying it once. Duplicates are harmless.

Together, these properties mean you can deliver updates in any order, with duplicates, with arbitrary batching, and the result is always the same. Network reordering, retries, and partial failures are all handled by the math.

Mental Model

Think of CRDT merge like pouring water into a container. The final water level (state) depends only on how much water was poured (updates), not the order of pouring. If you accidentally pour the same cup twice, the container's state is the same as if you poured it once (idempotence via duplicate detection). Whether you pour cup A then B, or B then A, the water level is identical (commutativity).

State-Based vs Operation-Based

There are two flavors of CRDTs:

State-Based (CvRDT — Convergent)

Replicas periodically send their entire state to other replicas. The receiving replica merges the incoming state with its own using a join function (least upper bound in a semilattice).

interface CvRDT<T> {
  state(): T;
  merge(remote: T): void;
}

Pros: Simple to implement, tolerates message loss and reordering naturally
Cons: Sending full state is expensive for large data structures

Operation-Based (CmRDT — Commutative)

Replicas broadcast operations (not state). Operations must be commutative.

interface CmRDT<Op> {
  apply(op: Op): void;
  generateOp(...args: unknown[]): Op;
}

Pros: Smaller messages (just the operation, not full state)
Cons: Requires reliable delivery (all operations must arrive at all replicas) and exactly-once semantics (or idempotent operations)

In practice, most real-world CRDT libraries (Yjs, Automerge) use hybrid approaches that combine aspects of both.

Quiz

What is the key difference between state-based and operation-based CRDTs?

ABCD

G-Counter (Grow-Only Counter)

The simplest useful CRDT. Each node has its own counter that only increases. The total is the sum of all nodes.

class GCounter {
  private counts: Map<string, number>;

  constructor(private nodeId: string, nodeIds: string[]) {
    this.counts = new Map(nodeIds.map((id) => [id, 0]));
  }

  increment(amount = 1): void {
    const current = this.counts.get(this.nodeId) ?? 0;
    this.counts.set(this.nodeId, current + amount);
  }

  value(): number {
    let total = 0;
    for (const count of this.counts.values()) {
      total += count;
    }
    return total;
  }

  merge(remote: Map<string, number>): void {
    for (const [nodeId, count] of remote) {
      const local = this.counts.get(nodeId) ?? 0;
      this.counts.set(nodeId, Math.max(local, count));
    }
  }

  state(): Map<string, number> {
    return new Map(this.counts);
  }
}

The merge uses Math.max per node — this is the join operation in the semilattice. It's commutative (max(a,b) = max(b,a)), associative (max(max(a,b),c) = max(a,max(b,c))), and idempotent (max(a,a) = a).

Quiz

Node A has G-Counter state {A:5, B:3, C:2}. Node B has state {A:3, B:7, C:2}. After merging, what is the counter value at both nodes?

ABCD

PN-Counter (Positive-Negative Counter)

G-Counter only goes up. For a counter that can also decrement, use two G-Counters: one for increments, one for decrements.

class PNCounter {
  private positive: GCounter;
  private negative: GCounter;

  constructor(nodeId: string, nodeIds: string[]) {
    this.positive = new GCounter(nodeId, nodeIds);
    this.negative = new GCounter(nodeId, nodeIds);
  }

  increment(amount = 1): void {
    this.positive.increment(amount);
  }

  decrement(amount = 1): void {
    this.negative.increment(amount);
  }

  value(): number {
    return this.positive.value() - this.negative.value();
  }

  merge(remote: { positive: Map<string, number>; negative: Map<string, number> }): void {
    this.positive.merge(remote.positive);
    this.negative.merge(remote.negative);
  }
}

Info

A PN-Counter can go negative. If you need a counter that stays at zero or above (like an inventory count), you need a Bounded Counter — a more complex CRDT that coordinates to prevent the value from dropping below the bound.

Sets: G-Set, 2P-Set, OR-Set

G-Set (Grow-Only Set)

Elements can be added but never removed. Merge is set union.

class GSet<T> {
  private elements = new Set<T>();

  add(element: T): void {
    this.elements.add(element);
  }

  has(element: T): boolean {
    return this.elements.has(element);
  }

  merge(remote: Set<T>): void {
    for (const element of remote) {
      this.elements.add(element);
    }
  }

  value(): Set<T> {
    return new Set(this.elements);
  }
}

2P-Set (Two-Phase Set)

Like a PN-Counter but for sets: one G-Set for adds, one for removes. Once an element is removed, it can never be re-added (the remove set is checked first).

class TwoPSet<T> {
  private added = new Set<T>();
  private removed = new Set<T>();

  add(element: T): void {
    this.added.add(element);
  }

  remove(element: T): void {
    if (this.added.has(element)) {
      this.removed.add(element);
    }
  }

  has(element: T): boolean {
    return this.added.has(element) && !this.removed.has(element);
  }

  merge(remote: { added: Set<T>; removed: Set<T> }): void {
    for (const e of remote.added) this.added.add(e);
    for (const e of remote.removed) this.removed.add(e);
  }
}

The limitation: once removed, an element is gone forever. That's often too restrictive.

OR-Set (Observed-Remove Set)

The most practical set CRDT. Each add generates a unique tag. Remove removes all currently observed tags for that element. Re-adding creates a new tag.

class ORSet<T> {
  private elements = new Map<T, Set<string>>();
  private tombstones = new Set<string>();

  constructor(private nodeId: string) {}

  add(element: T): void {
    const tag = `${this.nodeId}-${Date.now()}-${Math.random()}`;
    if (!this.elements.has(element)) {
      this.elements.set(element, new Set());
    }
    this.elements.get(element)!.add(tag);
  }

  remove(element: T): void {
    const tags = this.elements.get(element);
    if (tags) {
      for (const tag of tags) {
        this.tombstones.add(tag);
      }
      this.elements.delete(element);
    }
  }

  has(element: T): boolean {
    const tags = this.elements.get(element);
    if (!tags) return false;
    for (const tag of tags) {
      if (!this.tombstones.has(tag)) return true;
    }
    return false;
  }

  merge(remote: { elements: Map<T, Set<string>>; tombstones: Set<string> }): void {
    for (const [element, remoteTags] of remote.elements) {
      if (!this.elements.has(element)) {
        this.elements.set(element, new Set());
      }
      const localTags = this.elements.get(element)!;
      for (const tag of remoteTags) {
        if (!this.tombstones.has(tag)) {
          localTags.add(tag);
        }
      }
    }
    for (const tag of remote.tombstones) {
      this.tombstones.add(tag);
      for (const [element, tags] of this.elements) {
        tags.delete(tag);
        if (tags.size === 0) this.elements.delete(element);
      }
    }
  }
}

Quiz

In an OR-Set, Alice adds element X, Bob concurrently removes element X (which he had previously added). After merge, is X in the set?

ABCD

Registers: LWW-Register and MV-Register

A register holds a single value. The question is: what happens when two replicas write different values concurrently?

LWW-Register (Last Writer Wins)

Each write is tagged with a timestamp. The value with the highest timestamp wins.

class LWWRegister<T> {
  private val: T;
  private timestamp: number;
  private nodeId: string;

  constructor(initial: T, nodeId: string) {
    this.val = initial;
    this.timestamp = 0;
    this.nodeId = nodeId;
  }

  set(value: T): void {
    this.timestamp = Date.now();
    this.val = value;
  }

  get(): T {
    return this.val;
  }

  merge(remote: { value: T; timestamp: number; nodeId: string }): void {
    if (
      remote.timestamp > this.timestamp ||
      (remote.timestamp === this.timestamp && remote.nodeId > this.nodeId)
    ) {
      this.val = remote.value;
      this.timestamp = remote.timestamp;
    }
  }
}

Common Trap

LWW-Register depends on loosely synchronized clocks. If Node A's clock is 5 minutes ahead, its writes will always win regardless of when they actually happened. For this reason, LWW-Register should use logical clocks (Lamport timestamps) or hybrid logical clocks instead of wall-clock time.

MV-Register (Multi-Value Register)

Instead of picking a winner, keep all concurrent values and let the application (or user) resolve the conflict. Amazon's DynamoDB famously uses this approach (shopping cart example).

class MVRegister<T> {
  private values: Array<{ value: T; clock: Map<string, number> }> = [];

  set(value: T, clock: Map<string, number>): void {
    this.values = [{ value, clock }];
  }

  get(): T[] {
    return this.values.map((v) => v.value);
  }

  merge(remote: Array<{ value: T; clock: Map<string, number> }>): void {
    const all = [...this.values, ...remote];
    this.values = all.filter((a) =>
      !all.some((b) => a !== b && vectorClockDominates(b.clock, a.clock))
    );
  }
}

RGA: Replicated Growable Array (Text CRDT)

Text is the hardest CRDT application. RGA (Replicated Growable Array) is one approach: each character gets a unique, globally ordered ID. Instead of "insert at position 3," the operation is "insert after character with ID xyz."

interface RGANode {
  id: { nodeId: string; seq: number };
  char: string;
  deleted: boolean;
  next: RGANode | null;
}

class RGA {
  private head: RGANode;
  private nodeId: string;
  private seq = 0;

  constructor(nodeId: string) {
    this.nodeId = nodeId;
    this.head = {
      id: { nodeId: '__root', seq: 0 },
      char: '',
      deleted: false,
      next: null,
    };
  }

  insert(afterId: { nodeId: string; seq: number }, char: string): RGANode {
    const newId = { nodeId: this.nodeId, seq: ++this.seq };
    const newNode: RGANode = { id: newId, char, deleted: false, next: null };

    let target = this.findNode(afterId);
    if (!target) throw new Error('Target node not found');

    while (
      target.next &&
      !target.next.deleted &&
      this.compareIds(target.next.id, newId) > 0
    ) {
      target = target.next;
    }

    newNode.next = target.next;
    target.next = newNode;
    return newNode;
  }

  delete(id: { nodeId: string; seq: number }): void {
    const node = this.findNode(id);
    if (node) node.deleted = true;
  }

  text(): string {
    let result = '';
    let current = this.head.next;
    while (current) {
      if (!current.deleted) result += current.char;
      current = current.next;
    }
    return result;
  }

  private findNode(id: { nodeId: string; seq: number }): RGANode | null {
    let current: RGANode | null = this.head;
    while (current) {
      if (current.id.nodeId === id.nodeId && current.id.seq === id.seq) {
        return current;
      }
      current = current.next;
    }
    return null;
  }

  private compareIds(
    a: { nodeId: string; seq: number },
    b: { nodeId: string; seq: number }
  ): number {
    if (a.seq !== b.seq) return a.seq - b.seq;
    return a.nodeId.localeCompare(b.nodeId);
  }
}

Key design decisions in RGA:

Unique IDs: Each character has a (nodeId, sequence) pair that's globally unique
Tombstones: Deleted characters are marked as deleted, not removed (other replicas might reference them for insertion points)
Ordering: When two inserts happen at the same position, the comparison function (sequence number, then node ID) provides a deterministic tiebreaker

Quiz

Why do text CRDTs like RGA use tombstones (marking deleted characters) instead of physically removing them?

ABCD

The Complete CRDT Taxonomy

CRDT	Operations	Merge Strategy	Typical Use
G-Counter	Increment only	Max per node	Page views, likes
PN-Counter	Increment and decrement	Two G-Counters (pos - neg)	Inventory, voting
G-Set	Add only	Set union	Tags, membership (no remove)
2P-Set	Add and remove (once)	Two G-Sets	User lists (no re-add)
OR-Set	Add and remove (re-add allowed)	Tag-based observed remove	General-purpose sets
LWW-Register	Set value	Highest timestamp wins	Single-value fields, config
MV-Register	Set value	Keep all concurrent values	Shopping cart, conflict-visible
RGA	Insert and delete characters	ID-based ordered list	Collaborative text editing

Key Rules

1CRDTs guarantee convergence through algebraic properties: commutativity, associativity, idempotence
2State-based CRDTs send full state (robust but expensive); operation-based send ops (efficient but need reliable delivery)
3G-Counter and PN-Counter are the simplest CRDTs — use for counters and metrics
4OR-Set is the practical choice for sets — 2P-Set's no-re-add limitation is usually too restrictive
5LWW-Register silently discards concurrent writes — only use when 'last write wins' is acceptable semantics
6Text CRDTs use tombstones and unique IDs — they trade memory for correctness

Interview Question

Design: Distributed Shopping Cart

Design a shopping cart CRDT for an e-commerce platform with eventual consistency. Users can add items, remove items, and change quantities from multiple devices. The cart must: merge correctly when the phone and laptop make concurrent changes, handle the case where one device adds item X while another removes it, and maintain correct quantities when both devices increment the same item's count. Which CRDT types would you compose for this?