Data Model and API Design
The Invisible Architecture
Here's a pattern you see at every company that scaled past 50 engineers: the frontend team rewrites their data layer. Not because the old one was buggy, but because nobody designed it. They just... started fetching data and stuffing it into state.
The data model is the skeleton of your frontend. Get it wrong and every feature you build fights against it. Get it right and features practically write themselves.
This is the stuff that separates "I can build a todo app" from "I designed Instagram's feed architecture." Let's get into it.
Client-Side Data Models
Before you write a single component, you need to answer: what entities exist in this system, and how do they relate to each other?
Think of your data model like a database schema, but for the client. You're not just storing what the server gives you -- you're designing how your UI thinks about data. A server might return a flat JSON blob with 40 fields. Your client-side model should only keep what the UI actually needs, structured in the shape the UI actually consumes it.
Let's say you're building a project management tool (think Linear). Here's what a thoughtful client-side model looks like:
interface User {
id: string;
name: string;
avatarUrl: string;
email: string;
}
interface Project {
id: string;
name: string;
slug: string;
ownerId: string;
memberIds: string[];
createdAt: string;
}
interface Issue {
id: string;
title: string;
description: string;
status: "backlog" | "todo" | "in_progress" | "done" | "cancelled";
priority: "urgent" | "high" | "medium" | "low" | "none";
assigneeId: string | null;
projectId: string;
labelIds: string[];
createdAt: string;
updatedAt: string;
}
interface Label {
id: string;
name: string;
color: string;
}
interface Comment {
id: string;
issueId: string;
authorId: string;
body: string;
createdAt: string;
}
Notice a few things:
- IDs are strings, not numbers. UUIDs scale across distributed systems. Auto-incrementing integers leak information (competitor can guess your issue count) and break in multi-region setups.
- Relationships use IDs, not nested objects.
assigneeId: stringinstead ofassignee: User. This is the single most important decision in client-side modeling. We'll dig into why next. - Enums use string unions, not numbers.
status: "todo"is self-documenting.status: 2requires everyone to memorize a mapping. - Timestamps are strings. ISO 8601 strings serialize cleanly.
Dateobjects don't survive JSON serialization in caches, localStorage, or server component boundaries.
Normalized vs Denormalized State
This is the decision that haunts frontend teams at scale. Let's get precise about what each approach actually means.
Denormalized: What the Server Gives You
Most APIs return denormalized data -- entities nested inside each other:
interface DenormalizedIssue {
id: string;
title: string;
assignee: {
id: string;
name: string;
avatarUrl: string;
} | null;
project: {
id: string;
name: string;
slug: string;
};
labels: Array<{
id: string;
name: string;
color: string;
}>;
comments: Array<{
id: string;
body: string;
author: {
id: string;
name: string;
avatarUrl: string;
};
}>;
}
This is convenient. You fetch one endpoint, get everything you need, pass it to a component, done. For simple apps, this is fine. But here's where it breaks down.
The Consistency Problem
User "Alice" appears as the assignee on 30 issues and the author on 200 comments. She updates her avatar. In a denormalized world, you now have 230 stale copies of her old avatar scattered across your state tree. Your options:
- Refetch everything. Wasteful. You're re-downloading hundreds of issues because one user changed their profile pic.
- Walk the tree and update every copy. Error-prone. Miss one and you have an inconsistent UI.
- Accept staleness. Sometimes fine. Often not.
Normalized: A Client-Side Database
Normalization means storing each entity type in its own flat lookup table, referenced by ID:
interface NormalizedState {
users: Record<string, User>;
projects: Record<string, Project>;
issues: Record<string, Issue>;
labels: Record<string, Label>;
comments: Record<string, Comment>;
}
When data arrives from the API, you flatten it:
function normalizeIssueResponse(response: DenormalizedIssue): void {
const { assignee, project, labels, comments, ...issue } = response;
if (assignee) {
store.users[assignee.id] = assignee;
}
store.projects[project.id] = project;
for (const label of labels) {
store.labels[label.id] = label;
}
for (const comment of comments) {
store.users[comment.author.id] = comment.author;
store.comments[comment.id] = {
id: comment.id,
issueId: issue.id,
authorId: comment.author.id,
body: comment.body,
createdAt: comment.createdAt,
};
}
store.issues[issue.id] = {
...issue,
assigneeId: assignee?.id ?? null,
projectId: project.id,
labelIds: labels.map((l) => l.id),
};
}
Now Alice's avatar lives in exactly one place. Update it once, and every component that reads users[aliceId] sees the new avatar immediately.
When to Use Which
| Factor | Denormalized | Normalized |
|---|---|---|
| App complexity | Simple apps, few entities | Complex apps, many cross-referenced entities |
| Data duplication | High -- same entity copied everywhere | Zero -- single source of truth per entity |
| Update consistency | Must update every copy manually | Update once, reflected everywhere |
| Read performance | Fast -- data already shaped for components | Slightly slower -- must look up related entities |
| Write complexity | Simple -- just replace the blob | Higher -- must normalize incoming data |
| Memory usage | Higher with many duplicates | Lower -- each entity stored once |
| Best with | TanStack Query (server state stays denormalized) | Zustand/Redux (client-managed entities) |
| Real-world example | Blog with posts and comments | Linear, Figma, Notion -- collaborative tools |
Here's the thing most people miss: you don't have to pick one. Modern apps often keep server cache denormalized (TanStack Query caches the API response as-is) and normalize only the entities the user actively edits. A read-heavy dashboard can stay denormalized. A collaborative editor needs normalization.
Entity Relationships on the Client
Relational databases have join tables and foreign keys. On the client, you need to be more creative.
One-to-Many
An issue has many comments. Simple -- store the parent ID on the child:
interface Comment {
id: string;
issueId: string; // foreign key
body: string;
}
// Derive the list when you need it
function getCommentsForIssue(issueId: string): Comment[] {
return Object.values(store.comments)
.filter((c) => c.issueId === issueId)
.sort((a, b) => a.createdAt.localeCompare(b.createdAt));
}
For large collections, maintain an index to avoid scanning every time:
// Precomputed index: issueId -> commentId[]
const commentsByIssue: Record<string, string[]> = {};
function addComment(comment: Comment): void {
store.comments[comment.id] = comment;
if (!commentsByIssue[comment.issueId]) {
commentsByIssue[comment.issueId] = [];
}
commentsByIssue[comment.issueId].push(comment.id);
}
Many-to-Many
An issue can have many labels; a label can be on many issues. Two approaches:
Approach 1: ID arrays on one side (simpler, works for small collections)
interface Issue {
id: string;
labelIds: string[]; // many-to-many
}
// "All issues with label X"
function getIssuesByLabel(labelId: string): Issue[] {
return Object.values(store.issues)
.filter((issue) => issue.labelIds.includes(labelId));
}
Approach 2: Junction records (scales better, mirrors the server)
interface IssueLabel {
issueId: string;
labelId: string;
}
const issueLabels: IssueLabel[] = [];
function getLabelsForIssue(issueId: string): Label[] {
return issueLabels
.filter((il) => il.issueId === issueId)
.map((il) => store.labels[il.labelId]);
}
Use ID arrays when the relationship is small and rarely queried in reverse. Use junction records when you need efficient lookups in both directions or the relationship carries metadata (like addedAt, addedBy).
State Categorization
Not all state is created equal. The biggest mistake teams make is dumping everything into one global store. Here's how to think about it:
The key insight: each category has different lifecycle, persistence, and sharing requirements. Mixing them creates a mess.
// Server state: TanStack Query
const { data: issues } = useQuery({
queryKey: ["issues", projectId, filters],
queryFn: () => fetchIssues(projectId, filters),
staleTime: 30_000,
});
// URL state: searchParams
const searchParams = useSearchParams();
const status = searchParams.get("status") ?? "all";
const sort = searchParams.get("sort") ?? "updated";
// Client state: Zustand
const isSidebarOpen = useAppStore((s) => s.isSidebarOpen);
// Form state: React Hook Form
const { register, handleSubmit } = useForm<CreateIssueInput>();
A common anti-pattern is storing URL-derivable state in Zustand. If the current filter is in the URL (?status=todo), don't also store it in a Zustand store. You'll have two sources of truth that drift apart. Read from the URL, and treat router.push as your "setState."
API Design for Frontend
You're not just consuming APIs -- in system design interviews and real architecture work, you're designing them. Here's how to think about it from the frontend's perspective.
REST Endpoint Design
Good REST APIs are predictable. The frontend team should be able to guess the endpoint for any resource:
GET /api/projects/:projectId/issues -- list issues
GET /api/projects/:projectId/issues/:issueId -- get single issue
POST /api/projects/:projectId/issues -- create issue
PATCH /api/projects/:projectId/issues/:issueId -- update issue
DELETE /api/projects/:projectId/issues/:issueId -- delete issue
Pagination: Cursor vs Offset
This is a classic interview question, and the tradeoffs are real.
Offset-based: GET /issues?page=3&limit=20
interface OffsetPaginatedResponse<T> {
data: T[];
total: number;
page: number;
pageSize: number;
totalPages: number;
}
Cursor-based: GET /issues?cursor=abc123&limit=20
interface CursorPaginatedResponse<T> {
data: T[];
nextCursor: string | null;
hasMore: boolean;
}
| Factor | Offset | Cursor |
|---|---|---|
| Jump to page N | Yes -- ?page=N | No -- must traverse sequentially |
| Show total count | Yes -- included in response | Expensive -- requires separate COUNT query |
| Real-time inserts | Breaks -- inserting row shifts all offsets, causing duplicates or skips | Stable -- cursor points to a specific row |
| Performance at depth | Slow -- OFFSET 10000 scans 10000 rows | Fast -- seeks to cursor position directly |
| Infinite scroll | Works but fragile | Purpose-built for this |
| Best for | Admin tables, paginated search results | Feeds, timelines, infinite lists, real-time data |
Filtering and Sorting
Design filter APIs that map cleanly to UI controls:
interface IssueFilters {
status?: string[]; // multi-select: ?status=todo,in_progress
priority?: string[]; // multi-select: ?priority=high,urgent
assigneeId?: string | null; // single select or "unassigned" (null)
labelIds?: string[]; // multi-select
search?: string; // text search: ?search=login%20bug
sort?: string; // ?sort=updatedAt or ?sort=-priority (desc)
}
function buildQueryString(filters: IssueFilters): string {
const params = new URLSearchParams();
for (const [key, value] of Object.entries(filters)) {
if (value === undefined || value === null) continue;
if (Array.isArray(value) && value.length > 0) {
params.set(key, value.join(","));
} else if (typeof value === "string" && value.length > 0) {
params.set(key, value);
}
}
return params.toString();
}
REST vs GraphQL vs WebSocket
| Aspect | REST | GraphQL | WebSocket |
|---|---|---|---|
| Data fetching | Fixed response shape per endpoint | Client specifies exact fields needed | Server pushes data as events occur |
| Over-fetching | Common -- endpoints return everything | Eliminated -- request only what you render | N/A -- events are purpose-built |
| Under-fetching | Common -- need multiple requests for related data | Eliminated -- fetch nested resources in one query | N/A -- subscribe to specific events |
| Caching | Simple -- HTTP caching, ETags, CDN-friendly | Complex -- normalized cache required (Apollo/urql) | Manual -- maintain client-side state |
| Real-time | Polling only (inefficient) | Subscriptions (adds complexity) | Native -- built for real-time |
| Tooling maturity | Excellent -- fetch, axios, TanStack Query | Good -- Apollo, urql, Relay | Moderate -- Socket.IO, native WS |
| When to use | CRUD apps, public APIs, CDN-cached content | Complex UIs with many entity relationships | Chat, notifications, live collaboration |
GraphQL: When It Helps
GraphQL shines when your UI has complex, variable data needs:
# One request gets everything the issue detail page needs
query IssueDetail($id: ID!) {
issue(id: $id) {
id
title
description
status
assignee {
id
name
avatarUrl
}
labels {
id
name
color
}
comments(first: 20) {
edges {
node {
id
body
author {
id
name
}
createdAt
}
}
pageInfo {
hasNextPage
endCursor
}
}
}
}
With REST, this would be 3-4 separate requests: the issue, the assignee, the labels, the comments. GraphQL collapses them into one round trip.
GraphQL: When It Hurts
- Caching is hard. REST responses are URL-addressable and CDN-cacheable. GraphQL POST requests are not. You need a normalized client cache (Apollo, urql) that adds bundle size and complexity.
- Error handling is weird. A GraphQL response can be
200 OKwith partial errors. Your error handling must check both the HTTP status and theerrorsarray. - N+1 on the server. Without DataLoader, a nested query can trigger thousands of database queries. This is a server problem, but it impacts frontend performance.
The GraphQL Caching Tax
Apollo Client's normalized cache adds ~35-50KB to your bundle (minified + gzipped). For a simple CRUD app, that's a significant tax for something TanStack Query does in ~12KB. The normalization logic itself is non-trivial -- Apollo must parse your query, extract entity IDs, store each entity in a flat lookup, and reconstruct the query shape on read. Every cache read is a tree traversal.
The breakeven point is roughly when you have 5+ screens that share overlapping entities and you need instant consistency when one screen mutates shared data. Below that threshold, TanStack Query with manual invalidation is simpler and lighter.
WebSocket APIs
For real-time features -- chat, notifications, collaborative editing, live dashboards -- WebSockets give you server-push without polling overhead.
Event Schema Design
Structure your WebSocket messages with a type-safe event schema:
type ServerEvent =
| { type: "issue.updated"; payload: { issueId: string; changes: Partial<Issue> } }
| { type: "issue.created"; payload: { issue: Issue } }
| { type: "issue.deleted"; payload: { issueId: string } }
| { type: "comment.created"; payload: { comment: Comment } }
| { type: "presence.changed"; payload: { userId: string; status: "online" | "away" | "offline" } }
| { type: "error"; payload: { code: string; message: string } };
type ClientEvent =
| { type: "subscribe"; payload: { channel: string } }
| { type: "unsubscribe"; payload: { channel: string } }
| { type: "ping" };
Discriminated unions make the event handler exhaustive -- TypeScript will catch unhandled event types:
function handleServerEvent(event: ServerEvent): void {
switch (event.type) {
case "issue.updated":
store.issues[event.payload.issueId] = {
...store.issues[event.payload.issueId],
...event.payload.changes,
};
break;
case "issue.created":
store.issues[event.payload.issue.id] = event.payload.issue;
break;
case "issue.deleted":
delete store.issues[event.payload.issueId];
break;
case "comment.created":
store.comments[event.payload.comment.id] = event.payload.comment;
break;
case "presence.changed":
store.presence[event.payload.userId] = event.payload.status;
break;
case "error":
console.error(`WS error [${event.payload.code}]: ${event.payload.message}`);
break;
}
}
Reconnection Strategy
WebSocket connections drop. Networks change. Tabs go to sleep. You need a reconnection strategy that doesn't hammer the server:
function createReconnectingSocket(url: string) {
let attempt = 0;
let socket: WebSocket | null = null;
function connect() {
socket = new WebSocket(url);
socket.onopen = () => {
attempt = 0; // reset on successful connection
};
socket.onclose = (event) => {
if (event.code === 1000) return; // clean close, don't reconnect
attempt++;
const delay = Math.min(1000 * 2 ** attempt, 30_000); // exponential backoff, max 30s
const jitter = delay * (0.5 + Math.random() * 0.5); // add jitter to prevent thundering herd
setTimeout(connect, jitter);
};
}
connect();
return { getSocket: () => socket };
}
The exponential backoff with jitter is critical. Without it, if your server goes down, every client reconnects at the same instant when it comes back up -- a thundering herd that crashes it again.
Optimistic Updates
Optimistic updates make your UI feel instant. Instead of waiting for the server to confirm a mutation, you update the client state immediately and reconcile later.
Optimistic updates are like writing a check. You hand over the check (update the UI) before the money clears (server confirms). Most of the time, it clears fine. But you need a plan for when it bounces (server returns an error) -- roll back the UI to its previous state.
async function updateIssueStatus(
issueId: string,
newStatus: Issue["status"]
): Promise<void> {
const previousStatus = store.issues[issueId].status;
// Optimistic: update UI immediately
store.issues[issueId].status = newStatus;
try {
await api.patch(`/issues/${issueId}`, { status: newStatus });
} catch (error) {
// Rollback: restore previous state
store.issues[issueId].status = previousStatus;
toast.error("Failed to update status. Please try again.");
}
}
With TanStack Query, this is built in:
const mutation = useMutation({
mutationFn: (newStatus: string) =>
api.patch(`/issues/${issueId}`, { status: newStatus }),
onMutate: async (newStatus) => {
await queryClient.cancelQueries({ queryKey: ["issue", issueId] });
const previous = queryClient.getQueryData(["issue", issueId]);
queryClient.setQueryData(["issue", issueId], (old: Issue) => ({
...old,
status: newStatus,
}));
return { previous };
},
onError: (_err, _newStatus, context) => {
queryClient.setQueryData(["issue", issueId], context?.previous);
},
onSettled: () => {
queryClient.invalidateQueries({ queryKey: ["issue", issueId] });
},
});
Never do optimistic updates for operations that generate server-side data. If the server computes a createdAt timestamp, generates a UUID, or runs business logic that produces new fields, you can't accurately predict the result. Optimistic updates work best for simple field changes where the client already knows the final state.
Caching Strategies
Caching is the difference between an app that feels instant and one that makes users wait on every navigation.
Stale-While-Revalidate
The most important caching pattern in modern frontend development. Serve cached (potentially stale) data immediately, then revalidate in the background:
const { data: issues } = useQuery({
queryKey: ["issues", projectId],
queryFn: () => fetchIssues(projectId),
staleTime: 30_000, // data is "fresh" for 30 seconds
gcTime: 5 * 60_000, // keep in cache for 5 minutes after last use
refetchOnWindowFocus: true,
});
The user sees data immediately from cache. If it's stale, a background refetch updates it seamlessly. No loading spinners for navigating between pages you've already visited.
Cache Keys
Your cache key design determines how granular your cache invalidation can be:
// Hierarchical keys enable targeted invalidation
["issues"] // all issues
["issues", projectId] // issues for a project
["issues", projectId, { status, sort, page }] // specific filtered view
// Invalidate all issues for a project (including all filter combinations)
queryClient.invalidateQueries({ queryKey: ["issues", projectId] });
// Invalidate just the "todo" filtered view
queryClient.invalidateQueries({
queryKey: ["issues", projectId, { status: "todo" }],
});
Cache Invalidation
There are only two hard problems in computer science: cache invalidation, naming things, and off-by-one errors.
Three approaches, from simplest to most precise:
1. Time-based expiry (staleTime): Data becomes stale after N seconds. Simple, but imprecise -- you show stale data until the timer expires.
2. Event-driven invalidation: Invalidate when you know data changed:
const createIssue = useMutation({
mutationFn: (data: CreateIssueInput) => api.post("/issues", data),
onSuccess: () => {
// We know the issue list changed -- invalidate it
queryClient.invalidateQueries({ queryKey: ["issues", projectId] });
},
});
3. Real-time invalidation via WebSocket: The server tells you when data changed:
socket.addEventListener("message", (event) => {
const msg = JSON.parse(event.data) as ServerEvent;
if (msg.type === "issue.updated" || msg.type === "issue.created") {
queryClient.invalidateQueries({ queryKey: ["issues"] });
}
});
Error States in Data Models
Every piece of async data can be in one of several states. Model all of them, or your UI will break in production.
type AsyncState<T> =
| { status: "idle" }
| { status: "loading" }
| { status: "error"; error: Error; retryCount: number }
| { status: "success"; data: T; updatedAt: number }
| { status: "refreshing"; data: T; updatedAt: number };
The refreshing state is subtle but important. It means "I have data, but I'm fetching a newer version." The UI should show the existing data (not a loading spinner) with a subtle refresh indicator. TanStack Query handles this with isFetching vs isLoading:
isLoading= first load, no cached data yet (show skeleton)isFetching= any fetch, including background refetch (show subtle indicator)isError= fetch failed (show error with retry button)
function IssueList({ projectId }: { projectId: string }) {
const { data, isLoading, isFetching, isError, error, refetch } = useQuery({
queryKey: ["issues", projectId],
queryFn: () => fetchIssues(projectId),
});
if (isLoading) return <IssueListSkeleton />;
if (isError) return <ErrorState message={error.message} onRetry={refetch} />;
if (data.length === 0) return <EmptyState message="No issues yet" />;
return (
<>
{isFetching && <RefreshIndicator />}
{data.map((issue) => (
<IssueCard key={issue.id} issue={issue} />
))}
</>
);
}
- 1Model every async state: idle, loading, error, success, and refreshing -- missing any one leads to broken UI in production
- 2Normalize entities that appear in multiple places and change frequently -- denormalize everything else
- 3Use cursor pagination for infinite lists and real-time data, offset pagination only when users need to jump to arbitrary pages
- 4Optimistic updates only for mutations where the client can fully predict the result -- never for server-computed fields
- 5Cache keys should be hierarchical so you can invalidate at any level of granularity
- 6URL state is free state management -- if it belongs in the URL (filters, sort, page), put it there instead of Zustand
- 7WebSocket reconnection needs exponential backoff with jitter -- without it, server recovery causes thundering herd
| What developers do | What they should do |
|---|---|
| Storing the same entity data in multiple Zustand slices without normalization A user's avatar changes and now half your UI shows the old one, half shows the new one. With normalization, update it once and every component that references that user ID sees the change. | Normalize shared entities into a single lookup table, reference by ID elsewhere |
| Using offset pagination for an infinite-scroll feed with real-time inserts New items inserted at the top shift every offset by one. Users see duplicate items or skip items entirely when scrolling to the next page. | Use cursor-based pagination anchored to a specific row ID or timestamp |
| Doing optimistic updates for create operations and generating a client-side ID If you generate a UUID client-side and the server generates a different one, you now have a phantom entity in your cache. Other users can't reference an issue by your client-generated ID. | Show a pending state for creates, or use a temporary ID that gets replaced on server confirmation |
| Putting filter state in Zustand when it's already in the URL search params Two sources of truth that inevitably drift. The URL says ?status=todo but Zustand says 'all'. Which one is right? Users can't bookmark or share filtered views if the state lives in memory. | Read filters from URL search params directly, use router.push as your setState |
| Showing a full-page loading spinner on every navigation between cached pages Users already visited this page 10 seconds ago. The data hasn't changed. Making them stare at a spinner destroys the sense of speed that makes apps like Linear feel fast. | Use stale-while-revalidate: show cached data instantly, refetch in the background |