CRDTs: Theory and Types
What Makes a CRDT
A Conflict-free Replicated Data Type is a data structure designed so that replicas can be updated independently (without coordination) and are guaranteed to converge to the same state when they've seen the same set of updates — regardless of the order those updates are applied.
No central server. No locking. No consensus protocol. Just math.
The guarantee is called Strong Eventual Consistency (SEC): if two replicas have received the same set of updates (in any order), they are in the same state. This is stronger than regular eventual consistency, which only promises they'll converge "eventually" with no concrete bound.
The Three Properties
A CRDT merge function must satisfy three algebraic properties:
Commutativity: merge(a, b) = merge(b, a)
Applying A then B gives the same result as B then A. Order doesn't matter.
Associativity: merge(merge(a, b), c) = merge(a, merge(b, c))
Grouping doesn't matter. You can merge updates in any batches.
Idempotence: merge(a, a) = a
Applying the same update twice is the same as applying it once. Duplicates are harmless.
Together, these properties mean you can deliver updates in any order, with duplicates, with arbitrary batching, and the result is always the same. Network reordering, retries, and partial failures are all handled by the math.
Think of CRDT merge like pouring water into a container. The final water level (state) depends only on how much water was poured (updates), not the order of pouring. If you accidentally pour the same cup twice, the container's state is the same as if you poured it once (idempotence via duplicate detection). Whether you pour cup A then B, or B then A, the water level is identical (commutativity).
State-Based vs Operation-Based
There are two flavors of CRDTs:
State-Based (CvRDT — Convergent)
Replicas periodically send their entire state to other replicas. The receiving replica merges the incoming state with its own using a join function (least upper bound in a semilattice).
interface CvRDT<T> {
state(): T;
merge(remote: T): void;
}
- Pros: Simple to implement, tolerates message loss and reordering naturally
- Cons: Sending full state is expensive for large data structures
Operation-Based (CmRDT — Commutative)
Replicas broadcast operations (not state). Operations must be commutative.
interface CmRDT<Op> {
apply(op: Op): void;
generateOp(...args: unknown[]): Op;
}
- Pros: Smaller messages (just the operation, not full state)
- Cons: Requires reliable delivery (all operations must arrive at all replicas) and exactly-once semantics (or idempotent operations)
In practice, most real-world CRDT libraries (Yjs, Automerge) use hybrid approaches that combine aspects of both.
G-Counter (Grow-Only Counter)
The simplest useful CRDT. Each node has its own counter that only increases. The total is the sum of all nodes.
class GCounter {
private counts: Map<string, number>;
constructor(private nodeId: string, nodeIds: string[]) {
this.counts = new Map(nodeIds.map((id) => [id, 0]));
}
increment(amount = 1): void {
const current = this.counts.get(this.nodeId) ?? 0;
this.counts.set(this.nodeId, current + amount);
}
value(): number {
let total = 0;
for (const count of this.counts.values()) {
total += count;
}
return total;
}
merge(remote: Map<string, number>): void {
for (const [nodeId, count] of remote) {
const local = this.counts.get(nodeId) ?? 0;
this.counts.set(nodeId, Math.max(local, count));
}
}
state(): Map<string, number> {
return new Map(this.counts);
}
}
The merge uses Math.max per node — this is the join operation in the semilattice. It's commutative (max(a,b) = max(b,a)), associative (max(max(a,b),c) = max(a,max(b,c))), and idempotent (max(a,a) = a).
PN-Counter (Positive-Negative Counter)
G-Counter only goes up. For a counter that can also decrement, use two G-Counters: one for increments, one for decrements.
class PNCounter {
private positive: GCounter;
private negative: GCounter;
constructor(nodeId: string, nodeIds: string[]) {
this.positive = new GCounter(nodeId, nodeIds);
this.negative = new GCounter(nodeId, nodeIds);
}
increment(amount = 1): void {
this.positive.increment(amount);
}
decrement(amount = 1): void {
this.negative.increment(amount);
}
value(): number {
return this.positive.value() - this.negative.value();
}
merge(remote: { positive: Map<string, number>; negative: Map<string, number> }): void {
this.positive.merge(remote.positive);
this.negative.merge(remote.negative);
}
}
A PN-Counter can go negative. If you need a counter that stays at zero or above (like an inventory count), you need a Bounded Counter — a more complex CRDT that coordinates to prevent the value from dropping below the bound.
Sets: G-Set, 2P-Set, OR-Set
G-Set (Grow-Only Set)
Elements can be added but never removed. Merge is set union.
class GSet<T> {
private elements = new Set<T>();
add(element: T): void {
this.elements.add(element);
}
has(element: T): boolean {
return this.elements.has(element);
}
merge(remote: Set<T>): void {
for (const element of remote) {
this.elements.add(element);
}
}
value(): Set<T> {
return new Set(this.elements);
}
}
2P-Set (Two-Phase Set)
Like a PN-Counter but for sets: one G-Set for adds, one for removes. Once an element is removed, it can never be re-added (the remove set is checked first).
class TwoPSet<T> {
private added = new Set<T>();
private removed = new Set<T>();
add(element: T): void {
this.added.add(element);
}
remove(element: T): void {
if (this.added.has(element)) {
this.removed.add(element);
}
}
has(element: T): boolean {
return this.added.has(element) && !this.removed.has(element);
}
merge(remote: { added: Set<T>; removed: Set<T> }): void {
for (const e of remote.added) this.added.add(e);
for (const e of remote.removed) this.removed.add(e);
}
}
The limitation: once removed, an element is gone forever. That's often too restrictive.
OR-Set (Observed-Remove Set)
The most practical set CRDT. Each add generates a unique tag. Remove removes all currently observed tags for that element. Re-adding creates a new tag.
class ORSet<T> {
private elements = new Map<T, Set<string>>();
private tombstones = new Set<string>();
constructor(private nodeId: string) {}
add(element: T): void {
const tag = `${this.nodeId}-${Date.now()}-${Math.random()}`;
if (!this.elements.has(element)) {
this.elements.set(element, new Set());
}
this.elements.get(element)!.add(tag);
}
remove(element: T): void {
const tags = this.elements.get(element);
if (tags) {
for (const tag of tags) {
this.tombstones.add(tag);
}
this.elements.delete(element);
}
}
has(element: T): boolean {
const tags = this.elements.get(element);
if (!tags) return false;
for (const tag of tags) {
if (!this.tombstones.has(tag)) return true;
}
return false;
}
merge(remote: { elements: Map<T, Set<string>>; tombstones: Set<string> }): void {
for (const [element, remoteTags] of remote.elements) {
if (!this.elements.has(element)) {
this.elements.set(element, new Set());
}
const localTags = this.elements.get(element)!;
for (const tag of remoteTags) {
if (!this.tombstones.has(tag)) {
localTags.add(tag);
}
}
}
for (const tag of remote.tombstones) {
this.tombstones.add(tag);
for (const [element, tags] of this.elements) {
tags.delete(tag);
if (tags.size === 0) this.elements.delete(element);
}
}
}
}
Registers: LWW-Register and MV-Register
A register holds a single value. The question is: what happens when two replicas write different values concurrently?
LWW-Register (Last Writer Wins)
Each write is tagged with a timestamp. The value with the highest timestamp wins.
class LWWRegister<T> {
private val: T;
private timestamp: number;
private nodeId: string;
constructor(initial: T, nodeId: string) {
this.val = initial;
this.timestamp = 0;
this.nodeId = nodeId;
}
set(value: T): void {
this.timestamp = Date.now();
this.val = value;
}
get(): T {
return this.val;
}
merge(remote: { value: T; timestamp: number; nodeId: string }): void {
if (
remote.timestamp > this.timestamp ||
(remote.timestamp === this.timestamp && remote.nodeId > this.nodeId)
) {
this.val = remote.value;
this.timestamp = remote.timestamp;
}
}
}
LWW-Register depends on loosely synchronized clocks. If Node A's clock is 5 minutes ahead, its writes will always win regardless of when they actually happened. For this reason, LWW-Register should use logical clocks (Lamport timestamps) or hybrid logical clocks instead of wall-clock time.
MV-Register (Multi-Value Register)
Instead of picking a winner, keep all concurrent values and let the application (or user) resolve the conflict. Amazon's DynamoDB famously uses this approach (shopping cart example).
class MVRegister<T> {
private values: Array<{ value: T; clock: Map<string, number> }> = [];
set(value: T, clock: Map<string, number>): void {
this.values = [{ value, clock }];
}
get(): T[] {
return this.values.map((v) => v.value);
}
merge(remote: Array<{ value: T; clock: Map<string, number> }>): void {
const all = [...this.values, ...remote];
this.values = all.filter((a) =>
!all.some((b) => a !== b && vectorClockDominates(b.clock, a.clock))
);
}
}
RGA: Replicated Growable Array (Text CRDT)
Text is the hardest CRDT application. RGA (Replicated Growable Array) is one approach: each character gets a unique, globally ordered ID. Instead of "insert at position 3," the operation is "insert after character with ID xyz."
interface RGANode {
id: { nodeId: string; seq: number };
char: string;
deleted: boolean;
next: RGANode | null;
}
class RGA {
private head: RGANode;
private nodeId: string;
private seq = 0;
constructor(nodeId: string) {
this.nodeId = nodeId;
this.head = {
id: { nodeId: '__root', seq: 0 },
char: '',
deleted: false,
next: null,
};
}
insert(afterId: { nodeId: string; seq: number }, char: string): RGANode {
const newId = { nodeId: this.nodeId, seq: ++this.seq };
const newNode: RGANode = { id: newId, char, deleted: false, next: null };
let target = this.findNode(afterId);
if (!target) throw new Error('Target node not found');
while (
target.next &&
!target.next.deleted &&
this.compareIds(target.next.id, newId) > 0
) {
target = target.next;
}
newNode.next = target.next;
target.next = newNode;
return newNode;
}
delete(id: { nodeId: string; seq: number }): void {
const node = this.findNode(id);
if (node) node.deleted = true;
}
text(): string {
let result = '';
let current = this.head.next;
while (current) {
if (!current.deleted) result += current.char;
current = current.next;
}
return result;
}
private findNode(id: { nodeId: string; seq: number }): RGANode | null {
let current: RGANode | null = this.head;
while (current) {
if (current.id.nodeId === id.nodeId && current.id.seq === id.seq) {
return current;
}
current = current.next;
}
return null;
}
private compareIds(
a: { nodeId: string; seq: number },
b: { nodeId: string; seq: number }
): number {
if (a.seq !== b.seq) return a.seq - b.seq;
return a.nodeId.localeCompare(b.nodeId);
}
}
Key design decisions in RGA:
- Unique IDs: Each character has a (nodeId, sequence) pair that's globally unique
- Tombstones: Deleted characters are marked as deleted, not removed (other replicas might reference them for insertion points)
- Ordering: When two inserts happen at the same position, the comparison function (sequence number, then node ID) provides a deterministic tiebreaker
The Complete CRDT Taxonomy
| CRDT | Operations | Merge Strategy | Typical Use |
|---|---|---|---|
| G-Counter | Increment only | Max per node | Page views, likes |
| PN-Counter | Increment and decrement | Two G-Counters (pos - neg) | Inventory, voting |
| G-Set | Add only | Set union | Tags, membership (no remove) |
| 2P-Set | Add and remove (once) | Two G-Sets | User lists (no re-add) |
| OR-Set | Add and remove (re-add allowed) | Tag-based observed remove | General-purpose sets |
| LWW-Register | Set value | Highest timestamp wins | Single-value fields, config |
| MV-Register | Set value | Keep all concurrent values | Shopping cart, conflict-visible |
| RGA | Insert and delete characters | ID-based ordered list | Collaborative text editing |
- 1CRDTs guarantee convergence through algebraic properties: commutativity, associativity, idempotence
- 2State-based CRDTs send full state (robust but expensive); operation-based send ops (efficient but need reliable delivery)
- 3G-Counter and PN-Counter are the simplest CRDTs — use for counters and metrics
- 4OR-Set is the practical choice for sets — 2P-Set's no-re-add limitation is usually too restrictive
- 5LWW-Register silently discards concurrent writes — only use when 'last write wins' is acceptable semantics
- 6Text CRDTs use tombstones and unique IDs — they trade memory for correctness
Design: Distributed Shopping Cart
Design a shopping cart CRDT for an e-commerce platform with eventual consistency. Users can add items, remove items, and change quantities from multiple devices. The cart must: merge correctly when the phone and laptop make concurrent changes, handle the case where one device adds item X while another removes it, and maintain correct quantities when both devices increment the same item's count. Which CRDT types would you compose for this?