Skip to content

Data Model and API Design

advanced25 min read

The Invisible Architecture

Here's a pattern you see at every company that scaled past 50 engineers: the frontend team rewrites their data layer. Not because the old one was buggy, but because nobody designed it. They just... started fetching data and stuffing it into state.

The data model is the skeleton of your frontend. Get it wrong and every feature you build fights against it. Get it right and features practically write themselves.

This is the stuff that separates "I can build a todo app" from "I designed Instagram's feed architecture." Let's get into it.

Client-Side Data Models

Before you write a single component, you need to answer: what entities exist in this system, and how do they relate to each other?

Mental Model

Think of your data model like a database schema, but for the client. You're not just storing what the server gives you -- you're designing how your UI thinks about data. A server might return a flat JSON blob with 40 fields. Your client-side model should only keep what the UI actually needs, structured in the shape the UI actually consumes it.

Let's say you're building a project management tool (think Linear). Here's what a thoughtful client-side model looks like:

interface User {
  id: string;
  name: string;
  avatarUrl: string;
  email: string;
}

interface Project {
  id: string;
  name: string;
  slug: string;
  ownerId: string;
  memberIds: string[];
  createdAt: string;
}

interface Issue {
  id: string;
  title: string;
  description: string;
  status: "backlog" | "todo" | "in_progress" | "done" | "cancelled";
  priority: "urgent" | "high" | "medium" | "low" | "none";
  assigneeId: string | null;
  projectId: string;
  labelIds: string[];
  createdAt: string;
  updatedAt: string;
}

interface Label {
  id: string;
  name: string;
  color: string;
}

interface Comment {
  id: string;
  issueId: string;
  authorId: string;
  body: string;
  createdAt: string;
}

Notice a few things:

  • IDs are strings, not numbers. UUIDs scale across distributed systems. Auto-incrementing integers leak information (competitor can guess your issue count) and break in multi-region setups.
  • Relationships use IDs, not nested objects. assigneeId: string instead of assignee: User. This is the single most important decision in client-side modeling. We'll dig into why next.
  • Enums use string unions, not numbers. status: "todo" is self-documenting. status: 2 requires everyone to memorize a mapping.
  • Timestamps are strings. ISO 8601 strings serialize cleanly. Date objects don't survive JSON serialization in caches, localStorage, or server component boundaries.
Quiz
Why do client-side data models use ID references (assigneeId: string) instead of nested objects (assignee: User)?

Normalized vs Denormalized State

This is the decision that haunts frontend teams at scale. Let's get precise about what each approach actually means.

Denormalized: What the Server Gives You

Most APIs return denormalized data -- entities nested inside each other:

interface DenormalizedIssue {
  id: string;
  title: string;
  assignee: {
    id: string;
    name: string;
    avatarUrl: string;
  } | null;
  project: {
    id: string;
    name: string;
    slug: string;
  };
  labels: Array<{
    id: string;
    name: string;
    color: string;
  }>;
  comments: Array<{
    id: string;
    body: string;
    author: {
      id: string;
      name: string;
      avatarUrl: string;
    };
  }>;
}

This is convenient. You fetch one endpoint, get everything you need, pass it to a component, done. For simple apps, this is fine. But here's where it breaks down.

The Consistency Problem

User "Alice" appears as the assignee on 30 issues and the author on 200 comments. She updates her avatar. In a denormalized world, you now have 230 stale copies of her old avatar scattered across your state tree. Your options:

  1. Refetch everything. Wasteful. You're re-downloading hundreds of issues because one user changed their profile pic.
  2. Walk the tree and update every copy. Error-prone. Miss one and you have an inconsistent UI.
  3. Accept staleness. Sometimes fine. Often not.

Normalized: A Client-Side Database

Normalization means storing each entity type in its own flat lookup table, referenced by ID:

interface NormalizedState {
  users: Record<string, User>;
  projects: Record<string, Project>;
  issues: Record<string, Issue>;
  labels: Record<string, Label>;
  comments: Record<string, Comment>;
}

When data arrives from the API, you flatten it:

function normalizeIssueResponse(response: DenormalizedIssue): void {
  const { assignee, project, labels, comments, ...issue } = response;

  if (assignee) {
    store.users[assignee.id] = assignee;
  }

  store.projects[project.id] = project;

  for (const label of labels) {
    store.labels[label.id] = label;
  }

  for (const comment of comments) {
    store.users[comment.author.id] = comment.author;
    store.comments[comment.id] = {
      id: comment.id,
      issueId: issue.id,
      authorId: comment.author.id,
      body: comment.body,
      createdAt: comment.createdAt,
    };
  }

  store.issues[issue.id] = {
    ...issue,
    assigneeId: assignee?.id ?? null,
    projectId: project.id,
    labelIds: labels.map((l) => l.id),
  };
}

Now Alice's avatar lives in exactly one place. Update it once, and every component that reads users[aliceId] sees the new avatar immediately.

When to Use Which

FactorDenormalizedNormalized
App complexitySimple apps, few entitiesComplex apps, many cross-referenced entities
Data duplicationHigh -- same entity copied everywhereZero -- single source of truth per entity
Update consistencyMust update every copy manuallyUpdate once, reflected everywhere
Read performanceFast -- data already shaped for componentsSlightly slower -- must look up related entities
Write complexitySimple -- just replace the blobHigher -- must normalize incoming data
Memory usageHigher with many duplicatesLower -- each entity stored once
Best withTanStack Query (server state stays denormalized)Zustand/Redux (client-managed entities)
Real-world exampleBlog with posts and commentsLinear, Figma, Notion -- collaborative tools
Info

Here's the thing most people miss: you don't have to pick one. Modern apps often keep server cache denormalized (TanStack Query caches the API response as-is) and normalize only the entities the user actively edits. A read-heavy dashboard can stay denormalized. A collaborative editor needs normalization.

Quiz
You're building a Slack clone. Messages reference users (author), channels, reactions (which reference users), and threads. A user updates their display name. What's the most practical state architecture?

Entity Relationships on the Client

Relational databases have join tables and foreign keys. On the client, you need to be more creative.

One-to-Many

An issue has many comments. Simple -- store the parent ID on the child:

interface Comment {
  id: string;
  issueId: string; // foreign key
  body: string;
}

// Derive the list when you need it
function getCommentsForIssue(issueId: string): Comment[] {
  return Object.values(store.comments)
    .filter((c) => c.issueId === issueId)
    .sort((a, b) => a.createdAt.localeCompare(b.createdAt));
}

For large collections, maintain an index to avoid scanning every time:

// Precomputed index: issueId -> commentId[]
const commentsByIssue: Record<string, string[]> = {};

function addComment(comment: Comment): void {
  store.comments[comment.id] = comment;

  if (!commentsByIssue[comment.issueId]) {
    commentsByIssue[comment.issueId] = [];
  }
  commentsByIssue[comment.issueId].push(comment.id);
}

Many-to-Many

An issue can have many labels; a label can be on many issues. Two approaches:

Approach 1: ID arrays on one side (simpler, works for small collections)

interface Issue {
  id: string;
  labelIds: string[]; // many-to-many
}

// "All issues with label X"
function getIssuesByLabel(labelId: string): Issue[] {
  return Object.values(store.issues)
    .filter((issue) => issue.labelIds.includes(labelId));
}

Approach 2: Junction records (scales better, mirrors the server)

interface IssueLabel {
  issueId: string;
  labelId: string;
}

const issueLabels: IssueLabel[] = [];

function getLabelsForIssue(issueId: string): Label[] {
  return issueLabels
    .filter((il) => il.issueId === issueId)
    .map((il) => store.labels[il.labelId]);
}

Use ID arrays when the relationship is small and rarely queried in reverse. Use junction records when you need efficient lookups in both directions or the relationship carries metadata (like addedAt, addedBy).

State Categorization

Not all state is created equal. The biggest mistake teams make is dumping everything into one global store. Here's how to think about it:

The key insight: each category has different lifecycle, persistence, and sharing requirements. Mixing them creates a mess.

// Server state: TanStack Query
const { data: issues } = useQuery({
  queryKey: ["issues", projectId, filters],
  queryFn: () => fetchIssues(projectId, filters),
  staleTime: 30_000,
});

// URL state: searchParams
const searchParams = useSearchParams();
const status = searchParams.get("status") ?? "all";
const sort = searchParams.get("sort") ?? "updated";

// Client state: Zustand
const isSidebarOpen = useAppStore((s) => s.isSidebarOpen);

// Form state: React Hook Form
const { register, handleSubmit } = useForm<CreateIssueInput>();
Common Trap

A common anti-pattern is storing URL-derivable state in Zustand. If the current filter is in the URL (?status=todo), don't also store it in a Zustand store. You'll have two sources of truth that drift apart. Read from the URL, and treat router.push as your "setState."

Quiz
A user's selected theme (light/dark/system) should be stored as...

API Design for Frontend

You're not just consuming APIs -- in system design interviews and real architecture work, you're designing them. Here's how to think about it from the frontend's perspective.

REST Endpoint Design

Good REST APIs are predictable. The frontend team should be able to guess the endpoint for any resource:

GET    /api/projects/:projectId/issues          -- list issues
GET    /api/projects/:projectId/issues/:issueId  -- get single issue
POST   /api/projects/:projectId/issues          -- create issue
PATCH  /api/projects/:projectId/issues/:issueId  -- update issue
DELETE /api/projects/:projectId/issues/:issueId  -- delete issue

Pagination: Cursor vs Offset

This is a classic interview question, and the tradeoffs are real.

Offset-based: GET /issues?page=3&limit=20

interface OffsetPaginatedResponse<T> {
  data: T[];
  total: number;
  page: number;
  pageSize: number;
  totalPages: number;
}

Cursor-based: GET /issues?cursor=abc123&limit=20

interface CursorPaginatedResponse<T> {
  data: T[];
  nextCursor: string | null;
  hasMore: boolean;
}
FactorOffsetCursor
Jump to page NYes -- ?page=NNo -- must traverse sequentially
Show total countYes -- included in responseExpensive -- requires separate COUNT query
Real-time insertsBreaks -- inserting row shifts all offsets, causing duplicates or skipsStable -- cursor points to a specific row
Performance at depthSlow -- OFFSET 10000 scans 10000 rowsFast -- seeks to cursor position directly
Infinite scrollWorks but fragilePurpose-built for this
Best forAdmin tables, paginated search resultsFeeds, timelines, infinite lists, real-time data

Filtering and Sorting

Design filter APIs that map cleanly to UI controls:

interface IssueFilters {
  status?: string[];          // multi-select: ?status=todo,in_progress
  priority?: string[];        // multi-select: ?priority=high,urgent
  assigneeId?: string | null; // single select or "unassigned" (null)
  labelIds?: string[];        // multi-select
  search?: string;            // text search: ?search=login%20bug
  sort?: string;              // ?sort=updatedAt or ?sort=-priority (desc)
}

function buildQueryString(filters: IssueFilters): string {
  const params = new URLSearchParams();

  for (const [key, value] of Object.entries(filters)) {
    if (value === undefined || value === null) continue;
    if (Array.isArray(value) && value.length > 0) {
      params.set(key, value.join(","));
    } else if (typeof value === "string" && value.length > 0) {
      params.set(key, value);
    }
  }

  return params.toString();
}
Quiz
Your issue list shows 10,000 issues. Users report seeing duplicate issues when scrolling. The API uses offset pagination. What happened?

REST vs GraphQL vs WebSocket

AspectRESTGraphQLWebSocket
Data fetchingFixed response shape per endpointClient specifies exact fields neededServer pushes data as events occur
Over-fetchingCommon -- endpoints return everythingEliminated -- request only what you renderN/A -- events are purpose-built
Under-fetchingCommon -- need multiple requests for related dataEliminated -- fetch nested resources in one queryN/A -- subscribe to specific events
CachingSimple -- HTTP caching, ETags, CDN-friendlyComplex -- normalized cache required (Apollo/urql)Manual -- maintain client-side state
Real-timePolling only (inefficient)Subscriptions (adds complexity)Native -- built for real-time
Tooling maturityExcellent -- fetch, axios, TanStack QueryGood -- Apollo, urql, RelayModerate -- Socket.IO, native WS
When to useCRUD apps, public APIs, CDN-cached contentComplex UIs with many entity relationshipsChat, notifications, live collaboration

GraphQL: When It Helps

GraphQL shines when your UI has complex, variable data needs:

# One request gets everything the issue detail page needs
query IssueDetail($id: ID!) {
  issue(id: $id) {
    id
    title
    description
    status
    assignee {
      id
      name
      avatarUrl
    }
    labels {
      id
      name
      color
    }
    comments(first: 20) {
      edges {
        node {
          id
          body
          author {
            id
            name
          }
          createdAt
        }
      }
      pageInfo {
        hasNextPage
        endCursor
      }
    }
  }
}

With REST, this would be 3-4 separate requests: the issue, the assignee, the labels, the comments. GraphQL collapses them into one round trip.

GraphQL: When It Hurts

  • Caching is hard. REST responses are URL-addressable and CDN-cacheable. GraphQL POST requests are not. You need a normalized client cache (Apollo, urql) that adds bundle size and complexity.
  • Error handling is weird. A GraphQL response can be 200 OK with partial errors. Your error handling must check both the HTTP status and the errors array.
  • N+1 on the server. Without DataLoader, a nested query can trigger thousands of database queries. This is a server problem, but it impacts frontend performance.

The GraphQL Caching Tax

Apollo Client's normalized cache adds ~35-50KB to your bundle (minified + gzipped). For a simple CRUD app, that's a significant tax for something TanStack Query does in ~12KB. The normalization logic itself is non-trivial -- Apollo must parse your query, extract entity IDs, store each entity in a flat lookup, and reconstruct the query shape on read. Every cache read is a tree traversal.

The breakeven point is roughly when you have 5+ screens that share overlapping entities and you need instant consistency when one screen mutates shared data. Below that threshold, TanStack Query with manual invalidation is simpler and lighter.

WebSocket APIs

For real-time features -- chat, notifications, collaborative editing, live dashboards -- WebSockets give you server-push without polling overhead.

Event Schema Design

Structure your WebSocket messages with a type-safe event schema:

type ServerEvent =
  | { type: "issue.updated"; payload: { issueId: string; changes: Partial<Issue> } }
  | { type: "issue.created"; payload: { issue: Issue } }
  | { type: "issue.deleted"; payload: { issueId: string } }
  | { type: "comment.created"; payload: { comment: Comment } }
  | { type: "presence.changed"; payload: { userId: string; status: "online" | "away" | "offline" } }
  | { type: "error"; payload: { code: string; message: string } };

type ClientEvent =
  | { type: "subscribe"; payload: { channel: string } }
  | { type: "unsubscribe"; payload: { channel: string } }
  | { type: "ping" };

Discriminated unions make the event handler exhaustive -- TypeScript will catch unhandled event types:

function handleServerEvent(event: ServerEvent): void {
  switch (event.type) {
    case "issue.updated":
      store.issues[event.payload.issueId] = {
        ...store.issues[event.payload.issueId],
        ...event.payload.changes,
      };
      break;
    case "issue.created":
      store.issues[event.payload.issue.id] = event.payload.issue;
      break;
    case "issue.deleted":
      delete store.issues[event.payload.issueId];
      break;
    case "comment.created":
      store.comments[event.payload.comment.id] = event.payload.comment;
      break;
    case "presence.changed":
      store.presence[event.payload.userId] = event.payload.status;
      break;
    case "error":
      console.error(`WS error [${event.payload.code}]: ${event.payload.message}`);
      break;
  }
}

Reconnection Strategy

WebSocket connections drop. Networks change. Tabs go to sleep. You need a reconnection strategy that doesn't hammer the server:

function createReconnectingSocket(url: string) {
  let attempt = 0;
  let socket: WebSocket | null = null;

  function connect() {
    socket = new WebSocket(url);

    socket.onopen = () => {
      attempt = 0; // reset on successful connection
    };

    socket.onclose = (event) => {
      if (event.code === 1000) return; // clean close, don't reconnect

      attempt++;
      const delay = Math.min(1000 * 2 ** attempt, 30_000); // exponential backoff, max 30s
      const jitter = delay * (0.5 + Math.random() * 0.5);  // add jitter to prevent thundering herd
      setTimeout(connect, jitter);
    };
  }

  connect();
  return { getSocket: () => socket };
}

The exponential backoff with jitter is critical. Without it, if your server goes down, every client reconnects at the same instant when it comes back up -- a thundering herd that crashes it again.

Optimistic Updates

Optimistic updates make your UI feel instant. Instead of waiting for the server to confirm a mutation, you update the client state immediately and reconcile later.

Mental Model

Optimistic updates are like writing a check. You hand over the check (update the UI) before the money clears (server confirms). Most of the time, it clears fine. But you need a plan for when it bounces (server returns an error) -- roll back the UI to its previous state.

async function updateIssueStatus(
  issueId: string,
  newStatus: Issue["status"]
): Promise<void> {
  const previousStatus = store.issues[issueId].status;

  // Optimistic: update UI immediately
  store.issues[issueId].status = newStatus;

  try {
    await api.patch(`/issues/${issueId}`, { status: newStatus });
  } catch (error) {
    // Rollback: restore previous state
    store.issues[issueId].status = previousStatus;
    toast.error("Failed to update status. Please try again.");
  }
}

With TanStack Query, this is built in:

const mutation = useMutation({
  mutationFn: (newStatus: string) =>
    api.patch(`/issues/${issueId}`, { status: newStatus }),
  onMutate: async (newStatus) => {
    await queryClient.cancelQueries({ queryKey: ["issue", issueId] });
    const previous = queryClient.getQueryData(["issue", issueId]);
    queryClient.setQueryData(["issue", issueId], (old: Issue) => ({
      ...old,
      status: newStatus,
    }));
    return { previous };
  },
  onError: (_err, _newStatus, context) => {
    queryClient.setQueryData(["issue", issueId], context?.previous);
  },
  onSettled: () => {
    queryClient.invalidateQueries({ queryKey: ["issue", issueId] });
  },
});
Common Trap

Never do optimistic updates for operations that generate server-side data. If the server computes a createdAt timestamp, generates a UUID, or runs business logic that produces new fields, you can't accurately predict the result. Optimistic updates work best for simple field changes where the client already knows the final state.

Quiz
A user drags an issue from 'Todo' to 'In Progress' in a kanban board. You use optimistic updates. The server rejects the move because the issue is locked by another user. What should the UI do?

Caching Strategies

Caching is the difference between an app that feels instant and one that makes users wait on every navigation.

Stale-While-Revalidate

The most important caching pattern in modern frontend development. Serve cached (potentially stale) data immediately, then revalidate in the background:

const { data: issues } = useQuery({
  queryKey: ["issues", projectId],
  queryFn: () => fetchIssues(projectId),
  staleTime: 30_000,      // data is "fresh" for 30 seconds
  gcTime: 5 * 60_000,     // keep in cache for 5 minutes after last use
  refetchOnWindowFocus: true,
});

The user sees data immediately from cache. If it's stale, a background refetch updates it seamlessly. No loading spinners for navigating between pages you've already visited.

Cache Keys

Your cache key design determines how granular your cache invalidation can be:

// Hierarchical keys enable targeted invalidation
["issues"]                          // all issues
["issues", projectId]               // issues for a project
["issues", projectId, { status, sort, page }]  // specific filtered view

// Invalidate all issues for a project (including all filter combinations)
queryClient.invalidateQueries({ queryKey: ["issues", projectId] });

// Invalidate just the "todo" filtered view
queryClient.invalidateQueries({
  queryKey: ["issues", projectId, { status: "todo" }],
});

Cache Invalidation

There are only two hard problems in computer science: cache invalidation, naming things, and off-by-one errors.

Three approaches, from simplest to most precise:

1. Time-based expiry (staleTime): Data becomes stale after N seconds. Simple, but imprecise -- you show stale data until the timer expires.

2. Event-driven invalidation: Invalidate when you know data changed:

const createIssue = useMutation({
  mutationFn: (data: CreateIssueInput) => api.post("/issues", data),
  onSuccess: () => {
    // We know the issue list changed -- invalidate it
    queryClient.invalidateQueries({ queryKey: ["issues", projectId] });
  },
});

3. Real-time invalidation via WebSocket: The server tells you when data changed:

socket.addEventListener("message", (event) => {
  const msg = JSON.parse(event.data) as ServerEvent;

  if (msg.type === "issue.updated" || msg.type === "issue.created") {
    queryClient.invalidateQueries({ queryKey: ["issues"] });
  }
});

Error States in Data Models

Every piece of async data can be in one of several states. Model all of them, or your UI will break in production.

type AsyncState<T> =
  | { status: "idle" }
  | { status: "loading" }
  | { status: "error"; error: Error; retryCount: number }
  | { status: "success"; data: T; updatedAt: number }
  | { status: "refreshing"; data: T; updatedAt: number };

The refreshing state is subtle but important. It means "I have data, but I'm fetching a newer version." The UI should show the existing data (not a loading spinner) with a subtle refresh indicator. TanStack Query handles this with isFetching vs isLoading:

  • isLoading = first load, no cached data yet (show skeleton)
  • isFetching = any fetch, including background refetch (show subtle indicator)
  • isError = fetch failed (show error with retry button)
function IssueList({ projectId }: { projectId: string }) {
  const { data, isLoading, isFetching, isError, error, refetch } = useQuery({
    queryKey: ["issues", projectId],
    queryFn: () => fetchIssues(projectId),
  });

  if (isLoading) return <IssueListSkeleton />;
  if (isError) return <ErrorState message={error.message} onRetry={refetch} />;
  if (data.length === 0) return <EmptyState message="No issues yet" />;

  return (
    <>
      {isFetching && <RefreshIndicator />}
      {data.map((issue) => (
        <IssueCard key={issue.id} issue={issue} />
      ))}
    </>
  );
}
Key Rules
  1. 1Model every async state: idle, loading, error, success, and refreshing -- missing any one leads to broken UI in production
  2. 2Normalize entities that appear in multiple places and change frequently -- denormalize everything else
  3. 3Use cursor pagination for infinite lists and real-time data, offset pagination only when users need to jump to arbitrary pages
  4. 4Optimistic updates only for mutations where the client can fully predict the result -- never for server-computed fields
  5. 5Cache keys should be hierarchical so you can invalidate at any level of granularity
  6. 6URL state is free state management -- if it belongs in the URL (filters, sort, page), put it there instead of Zustand
  7. 7WebSocket reconnection needs exponential backoff with jitter -- without it, server recovery causes thundering herd
What developers doWhat they should do
Storing the same entity data in multiple Zustand slices without normalization
A user's avatar changes and now half your UI shows the old one, half shows the new one. With normalization, update it once and every component that references that user ID sees the change.
Normalize shared entities into a single lookup table, reference by ID elsewhere
Using offset pagination for an infinite-scroll feed with real-time inserts
New items inserted at the top shift every offset by one. Users see duplicate items or skip items entirely when scrolling to the next page.
Use cursor-based pagination anchored to a specific row ID or timestamp
Doing optimistic updates for create operations and generating a client-side ID
If you generate a UUID client-side and the server generates a different one, you now have a phantom entity in your cache. Other users can't reference an issue by your client-generated ID.
Show a pending state for creates, or use a temporary ID that gets replaced on server confirmation
Putting filter state in Zustand when it's already in the URL search params
Two sources of truth that inevitably drift. The URL says ?status=todo but Zustand says 'all'. Which one is right? Users can't bookmark or share filtered views if the state lives in memory.
Read filters from URL search params directly, use router.push as your setState
Showing a full-page loading spinner on every navigation between cached pages
Users already visited this page 10 seconds ago. The data hasn't changed. Making them stare at a spinner destroys the sense of speed that makes apps like Linear feel fast.
Use stale-while-revalidate: show cached data instantly, refetch in the background
Quiz
Your TanStack Query cache key is ['issues', projectId, filters]. You create a new issue. Which invalidation strategy ensures the issue list updates without over-invalidating unrelated queries?