System Design: Autocomplete Search

advanced25 min read

The Component Every Senior Engineer Gets Asked About

Autocomplete search shows up in almost every frontend system design interview. Google, Algolia, Spotlight, Cmd+K palettes, e-commerce search bars — they all share the same fundamental architecture. And they all get the same things wrong when built naively.

Here is the thing most people miss: autocomplete is not a text input that fetches suggestions. It is a real-time coordination problem between user input, network requests, local data, keyboard navigation, screen readers, and perceived performance. Get any one of those wrong and the experience falls apart.

We will design this component using the RADIO framework — Requirements, Architecture, Data Model, Interface, Optimizations — the same structured approach used in FAANG system design rounds.

Mental Model

Think of autocomplete like a concierge at a hotel lobby. You start describing what you need — "I want a restaurant that..." — and the concierge starts suggesting options before you finish speaking. They do not wait for you to complete your sentence. They do not suggest the same thing twice. They prioritize what is most relevant. And if you change direction mid-sentence, they instantly pivot. A great concierge handles all of this seamlessly. A bad one makes you repeat yourself, gives stale suggestions, or talks over you. Your autocomplete component is that concierge.

The RADIO Pipeline

Before we dive into each stage, here is the complete flow from keystroke to rendered suggestion:

Autocomplete: Keystroke to SuggestionPhase 1 / 5

Phase 1 / 5Keystroke

User types a character. The input event fires on every keystroke.

onChangecontrolled input

1/5

R — Requirements

Every system design starts here. If you jump to code without nailing requirements, you will build the wrong thing. Split them into functional and non-functional.

Functional Requirements

Core:

Type-ahead suggestions appear as the user types
Suggestions update with every meaningful input change (after debounce)
Recent searches persist across sessions (localStorage)
Results are grouped by category (products, pages, users, recent searches)
Keyboard navigation: arrow keys to move, Enter to select, Escape to close
Clicking or selecting a suggestion navigates to the result

Secondary:

Clear individual or all recent searches
Highlight the matched portion of each suggestion
Show a "no results" state with helpful messaging
Support for mobile (touch targets, virtual keyboard considerations)

Non-Functional Requirements

Requirement	Target	Why
Perceived response time	Under 100ms	Users perceive delays above 100ms as sluggish. Local cache + optimistic UI bridges the gap
Debounce interval	250-300ms	Balance between responsiveness and server load. Too short floods the API, too long feels laggy
Accessibility	WCAG 2.1 AA	Combobox ARIA pattern is mandatory. Screen reader must announce suggestion count and selected item
Bundle size	Under 5KB gzipped	This loads on every page. Heavy search components destroy LCP
Offline resilience	Recent searches work offline	localStorage provides search history without network

Quiz

Why is 300ms chosen as the debounce interval for autocomplete, rather than 100ms or 1000ms?

ABCD

A — Architecture

Component Tree

SearchContainer
├── SearchInput          (controlled input + trigger button)
├── SuggestionPanel      (dropdown container, ARIA listbox)
│   ├── RecentSearches   (localStorage history section)
│   │   └── SuggestionItem[]
│   ├── CategoryGroup    (grouped server results)
│   │   ├── CategoryHeader
│   │   └── SuggestionItem[]
│   └── NoResults        (empty state)
└── SearchOverlay        (backdrop for mobile, click-away close)

Autocomplete Search Architecture6 components

Click to inspect

Why this structure?

SearchContainer owns all state: query string, suggestions, active index, open/closed. Single source of truth.
SuggestionPanel is a pure presentational component. It receives suggestions and renders them. Zero data fetching logic.
SuggestionItem is reused across recent searches and server results. Same keyboard navigation, same click handler, same ARIA role.
SearchOverlay handles the "click outside to close" pattern without attaching global event listeners on every render.

The Combobox ARIA Pattern

This is where most implementations fail their accessibility audit. Autocomplete is an ARIA combobox — one of the most complex ARIA patterns. Here is the contract:

// SearchInput
<input
  role="combobox"
  aria-expanded={isOpen}
  aria-controls="suggestion-listbox"
  aria-activedescendant={activeId}
  aria-autocomplete="list"
  aria-haspopup="listbox"
/>

// SuggestionPanel
<ul
  id="suggestion-listbox"
  role="listbox"
  aria-label="Search suggestions"
>
  {suggestions.map((item, i) => (
    <li
      key={item.id}
      id={`suggestion-${item.id}`}
      role="option"
      aria-selected={i === activeIndex}
    >
      {item.label}
    </li>
  ))}
</ul>

The critical detail: aria-activedescendant on the input points to the ID of the currently highlighted suggestion. This tells screen readers which option is "focused" without actually moving DOM focus away from the input. The user keeps typing while arrows navigate the list — exactly how a sighted user experiences it.

Why aria-activedescendant, not moving focus

Moving focus to each suggestion item with li.focus() would break the typing experience. The user would lose their cursor position in the input field every time they press an arrow key. aria-activedescendant solves this by keeping DOM focus on the input while virtually pointing to the active suggestion. Screen readers like VoiceOver and NVDA announce the active option as if it were focused. This pattern is defined in the WAI-ARIA Combobox specification and is the only correct approach for autocomplete components.

Focus Management

Tab        → focuses the search input, opens panel if query exists
Arrow Down → moves to next suggestion (wraps to first)
Arrow Up   → moves to previous suggestion (wraps to last)
Enter      → selects the active suggestion, closes panel
Escape     → clears active suggestion first press, closes panel second press
Home       → moves to first suggestion
End        → moves to last suggestion

Two-stage Escape is a subtle but important UX pattern. If the user has navigated to suggestion 5 and presses Escape, they probably want to return to the input — not close the entire panel. Only the second Escape dismisses.

Quiz

In an ARIA combobox, which element should receive DOM focus when the user presses Arrow Down to navigate suggestions?

ABCD

D — Data Model

Core Entities

type SuggestionCategory = 'recent' | 'product' | 'page' | 'user'

interface Suggestion {
  id: string
  label: string
  category: SuggestionCategory
  url: string
  icon?: string
  subtitle?: string
  matchRanges: Array<{ start: number; length: number }>
}

interface SearchState {
  query: string
  suggestions: Suggestion[]
  activeIndex: number
  isOpen: boolean
  isLoading: boolean
  error: string | null
}

interface SearchHistoryEntry {
  query: string
  timestamp: number
  resultCount: number
}

Why matchRanges instead of pre-formatted HTML?

Because the server should not dictate presentation. matchRanges tells you which characters matched, and the client decides how to highlight them (bold, background color, underline). This keeps the API response format-agnostic and prevents XSS vectors from server-rendered HTML in suggestions.

Result Ranking

Suggestions should be ordered by:

Exact prefix match (query "rea" → "React" ranks above "Reactive Programming")
Recency (recent searches appear first in their category)
Category priority (recent → products → pages → users)
Popularity (server-side signal, if available)

function rankSuggestions(
  serverResults: Suggestion[],
  recentSearches: SearchHistoryEntry[],
  query: string
): Suggestion[] {
  const recent = recentSearches
    .filter(entry => entry.query.toLowerCase().startsWith(query.toLowerCase()))
    .slice(0, 3)
    .map(entry => ({
      id: `recent-${entry.query}`,
      label: entry.query,
      category: 'recent' as const,
      url: `/search?q=${encodeURIComponent(entry.query)}`,
      matchRanges: [{ start: 0, length: query.length }],
    }))

  const seen = new Set(recent.map(r => r.label.toLowerCase()))
  const deduped = serverResults.filter(
    s => !seen.has(s.label.toLowerCase())
  )

  return [...recent, ...deduped]
}

Common Trap

Never deduplicate suggestions by id alone. A recent search for "react hooks" and a server result for "React Hooks" are the same item to the user but have different IDs. Always normalize by label (case-insensitive) when merging local and server results.

I — Interface (API Design)

The Search Endpoint

GET /api/search/suggest?q=rea&limit=10&categories=product,page,user

Response:

{
  "query": "rea",
  "suggestions": [
    {
      "id": "p-react-19",
      "label": "React 19 Deep Dive",
      "category": "page",
      "url": "/courses/react-19-deep-dive",
      "subtitle": "Course — 12 modules",
      "matchRanges": [{ "start": 0, "length": 3 }]
    }
  ],
  "total": 42,
  "cached": false
}

Why the response includes query: Race conditions. If the user types "rea", then "react", two requests fire. The "react" response might arrive before the "rea" response. By including the original query in the response, the client can discard stale results — only the response matching the current input gets rendered.

Request Management

Three mechanisms work together to prevent wasted requests:

1. Debounce (300ms)

function useDebounce<T>(value: T, delay: number): T {
  const [debouncedValue, setDebouncedValue] = useState(value)

  useEffect(() => {
    const timer = setTimeout(() => setDebouncedValue(value), delay)
    return () => clearTimeout(timer)
  }, [value, delay])

  return debouncedValue
}

2. Abort previous request

function useSuggestions(query: string) {
  const [suggestions, setSuggestions] = useState<Suggestion[]>([])
  const abortRef = useRef<AbortController | null>(null)

  useEffect(() => {
    if (query.length < 2) {
      setSuggestions([])
      return
    }

    abortRef.current?.abort()
    const controller = new AbortController()
    abortRef.current = controller

    fetch(`/api/search/suggest?q=${encodeURIComponent(query)}`, {
      signal: controller.signal,
    })
      .then(res => res.json())
      .then(data => setSuggestions(data.suggestions))
      .catch(err => {
        if (err.name !== 'AbortError') throw err
      })

    return () => controller.abort()
  }, [query])

  return suggestions
}

3. LRU cache check before fetch

class LRUCache<V> {
  private cache = new Map<string, V>()

  constructor(private maxSize: number) {}

  get(key: string): V | undefined {
    const value = this.cache.get(key)
    if (value !== undefined) {
      this.cache.delete(key)
      this.cache.set(key, value)
    }
    return value
  }

  set(key: string, value: V): void {
    this.cache.delete(key)
    if (this.cache.size >= this.maxSize) {
      const oldest = this.cache.keys().next().value
      if (oldest !== undefined) this.cache.delete(oldest)
    }
    this.cache.set(key, value)
  }
}

The Map insertion order trick is the key insight. Map in JavaScript preserves insertion order. When you get a key, you delete and re-insert it — moving it to the "most recently used" end. The oldest entry is always keys().next().value. This gives you O(1) LRU eviction without a doubly-linked list.

Quiz

Why does the autocomplete use AbortController to cancel in-flight requests instead of simply ignoring stale responses?

ABCD

O — Optimizations

This is where good autocomplete becomes great. Every optimization here solves a real problem at scale.

Debounce vs Throttle: Why Debounce Wins

Both limit how often a function fires. The difference matters here:

Throttle fires at regular intervals (e.g., once per 300ms). The user sees intermediate results while still typing.
Debounce waits until the user stops typing for 300ms, then fires once.

For autocomplete, debounce wins because:

Intermediate results are distracting. Showing "r" results, then "re" results, then "rea" results creates visual noise.
Each intermediate result triggers a re-render and layout recalculation of the suggestion list.
The user is still typing — they do not care about partial results yet.
Server load drops dramatically. A throttled search on "react hooks" fires 4-5 requests. Debounced, it fires 1-2.

The one exception: if your search API is fast enough (under 50ms) and you want to show results while the user types (like Google Instant), throttle with a short interval works. But this requires a custom backend optimized for prefix queries, not a general-purpose search API.

Trie for Local Prefix Matching

For recent searches and client-side data, a trie gives O(m) prefix lookup where m is the query length — independent of how many items exist:

class TrieNode {
  children = new Map<string, TrieNode>()
  items: Suggestion[] = []
}

class PrefixTrie {
  private root = new TrieNode()

  insert(key: string, item: Suggestion): void {
    let node = this.root
    for (const char of key.toLowerCase()) {
      if (!node.children.has(char)) {
        node.children.set(char, new TrieNode())
      }
      node = node.children.get(char)!
    }
    node.items.push(item)
  }

  search(prefix: string): Suggestion[] {
    let node = this.root
    for (const char of prefix.toLowerCase()) {
      const child = node.children.get(char)
      if (!child) return []
      node = child
    }
    return this.collectAll(node)
  }

  private collectAll(node: TrieNode): Suggestion[] {
    const results = [...node.items]
    for (const child of node.children.values()) {
      results.push(...this.collectAll(child))
    }
    return results
  }
}

For a small set of recent searches (under 100 items), Array.filter with startsWith is perfectly fine. The trie becomes valuable when you have thousands of local items — product catalogs, command palette entries, or cached suggestions from previous sessions.

Highlighting Matched Text

Do not use dangerouslySetInnerHTML or regex replacement to bold matched characters. Use the matchRanges from the API to split the label into segments:

function HighlightedText({
  text,
  ranges,
}: {
  text: string
  ranges: Array<{ start: number; length: number }>
}) {
  if (ranges.length === 0) return <span>{text}</span>

  const segments: Array<{ text: string; highlighted: boolean }> = []
  let cursor = 0

  for (const range of ranges) {
    if (cursor < range.start) {
      segments.push({
        text: text.slice(cursor, range.start),
        highlighted: false,
      })
    }
    segments.push({
      text: text.slice(range.start, range.start + range.length),
      highlighted: true,
    })
    cursor = range.start + range.length
  }

  if (cursor < text.length) {
    segments.push({ text: text.slice(cursor), highlighted: false })
  }

  return (
    <span>
      {segments.map((seg, i) =>
        seg.highlighted ? (
          <mark key={i}>{seg.text}</mark>
        ) : (
          <span key={i}>{seg.text}</span>
        )
      )}
    </span>
  )
}

This approach is XSS-safe (no raw HTML injection), accessible (screen readers handle <mark> correctly), and the server controls what gets highlighted without shipping HTML.

Virtualizing Long Suggestion Lists

If your API returns 50+ suggestions, rendering all of them is wasteful. Only 8-10 are visible at any time. Use a virtualized list to render only visible items:

const ITEM_HEIGHT = 48
const VISIBLE_COUNT = 8

function getVisibleRange(scrollTop: number, totalCount: number) {
  const start = Math.floor(scrollTop / ITEM_HEIGHT)
  const end = Math.min(start + VISIBLE_COUNT + 2, totalCount)
  return { start, end }
}

For most autocomplete implementations with 10-20 results, virtualization is overkill. But for command palettes that search across hundreds of actions (VS Code, Linear, Notion), it is essential.

Preconnect to Search API

If your search API lives on a different origin, add a <link rel="preconnect"> in your document head:

<link rel="preconnect" href="https://search-api.example.com" />

This establishes the TCP connection and TLS handshake before the first search request fires. For a typical HTTPS connection, this saves 100-200ms on the first request — the difference between "fast" and "instant."

Request Deduplication

If the user types "react", deletes the "t", and retypes "t" within the cache TTL, the query "react" fires twice. Deduplicate with the LRU cache:

async function fetchSuggestions(
  query: string,
  cache: LRUCache<Suggestion[]>,
  signal: AbortSignal
): Promise<Suggestion[]> {
  const cached = cache.get(query)
  if (cached) return cached

  const res = await fetch(
    `/api/search/suggest?q=${encodeURIComponent(query)}`,
    { signal }
  )
  const data = await res.json()
  cache.set(query, data.suggestions)
  return data.suggestions
}

This cache should be per-session (not persisted to localStorage), with a max size of 50-100 entries and time-based invalidation if your data changes frequently.

Quiz

A user types 'javascript' at a pace of one character every 150ms with a 300ms debounce. How many API requests are fired?

ABCD

Putting It All Together

Here is the complete state machine for the autocomplete component:

Execution Trace

Idle

Input is empty. Panel is closed. No suggestions loaded.

Initial state on page load

Typing

User types first character. Debounce timer starts (300ms). No fetch yet.

Input updates immediately (controlled)

Debounce fires

300ms of silence elapsed. Check LRU cache for this query.

If cached, skip fetch entirely

Fetch

Cache miss. Abort any in-flight request. Send GET /suggest. Show loading skeleton.

AbortController cancels stale request

Response

Server returns suggestions. Merge with local recent searches. Deduplicate. Rank.

Cache the response in LRU

Panel open

Suggestions rendered. ARIA listbox active. Keyboard navigation enabled.

aria-expanded=true, aria-activedescendant set

Navigate

Arrow keys update activeIndex. aria-activedescendant points to highlighted item.

DOM focus stays on input

Select

Enter pressed or item clicked. Panel closes. Navigate to result URL. Save to search history.

localStorage updated

Dismiss

Escape pressed (twice) or click outside. Panel closes. Query may persist in input.

aria-expanded=false

Common Pitfalls

What developers do	What they should do
Use throttle instead of debounce for search input Throttle fires intermediate requests that produce flickering results. Debounce waits for a natural pause, sending fewer requests with more complete queries.	Use debounce to fire only after the user pauses typing
Move DOM focus to suggestion items on arrow key press Moving focus to list items removes the cursor from the input, preventing the user from continuing to type while browsing suggestions.	Keep focus on the input and use aria-activedescendant to indicate the active suggestion
Use innerHTML or dangerouslySetInnerHTML to highlight matched text Server-supplied HTML in suggestions is an XSS attack vector. Programmatic splitting is safe by construction.	Split the label string by matchRanges and render segments with mark elements
Ignore stale responses that arrive after newer queries Ignoring responses still downloads the full response body and consumes server compute. AbortController actually cancels the connection, saving bandwidth and resources.	Use AbortController to cancel in-flight requests when a new query fires
Store the entire search history array in React state Search history is session-persistent data, not UI state. Putting it in React state means it resets on page navigation and duplicates storage logic.	Use localStorage for search history and read it on mount or when the panel opens
Debounce with 50ms thinking it improves responsiveness 50ms is shorter than the inter-keystroke gap for most typists, so it fires on nearly every character — defeating the purpose of debouncing entirely.	Use 250-300ms debounce interval

Essential Rules

Key Rules

1Debounce search input at 250-300ms. Shorter defeats the purpose, longer feels laggy.
2Abort in-flight requests with AbortController before sending new ones. Never let stale responses race.
3Cache responses in an LRU cache (50-100 entries). Same query within seconds should never hit the network twice.
4Use the ARIA combobox pattern: role=combobox on input, role=listbox on suggestions, aria-activedescendant for virtual focus.
5Keep DOM focus on the input during keyboard navigation. Never move focus to suggestion list items.
6Include the original query in API responses so the client can detect and discard stale results from race conditions.
7Highlight matched text by splitting the string at matchRanges, not by injecting HTML. No dangerouslySetInnerHTML.
8Preconnect to the search API origin to eliminate cold-start latency on the first request.

Final Quiz: Full System Understanding

Quiz

A user types 'react', sees suggestions, clears the input, then types 'react' again within 10 seconds. In a well-designed autocomplete, what happens on the second search?

ABCD

Quiz

Your autocomplete works perfectly with a mouse but screen reader users report they cannot navigate suggestions. The most likely missing attribute is:

ABCD

System Design: Autocomplete Search

The Component Every Senior Engineer Gets Asked About

The RADIO Pipeline

R — Requirements

Functional Requirements

Non-Functional Requirements

A — Architecture

Component Tree

The Combobox ARIA Pattern

Focus Management

D — Data Model

Core Entities

Result Ranking

I — Interface (API Design)

The Search Endpoint

Request Management

O — Optimizations

Debounce vs Throttle: Why Debounce Wins

Trie for Local Prefix Matching

Highlighting Matched Text

Virtualizing Long Suggestion Lists

Preconnect to Search API

Request Deduplication

Putting It All Together

Common Pitfalls

Essential Rules

Final Quiz: Full System Understanding

Related Topics