System Design: Autocomplete Search
The Component Every Senior Engineer Gets Asked About
Autocomplete search shows up in almost every frontend system design interview. Google, Algolia, Spotlight, Cmd+K palettes, e-commerce search bars — they all share the same fundamental architecture. And they all get the same things wrong when built naively.
Here is the thing most people miss: autocomplete is not a text input that fetches suggestions. It is a real-time coordination problem between user input, network requests, local data, keyboard navigation, screen readers, and perceived performance. Get any one of those wrong and the experience falls apart.
We will design this component using the RADIO framework — Requirements, Architecture, Data Model, Interface, Optimizations — the same structured approach used in FAANG system design rounds.
Think of autocomplete like a concierge at a hotel lobby. You start describing what you need — "I want a restaurant that..." — and the concierge starts suggesting options before you finish speaking. They do not wait for you to complete your sentence. They do not suggest the same thing twice. They prioritize what is most relevant. And if you change direction mid-sentence, they instantly pivot. A great concierge handles all of this seamlessly. A bad one makes you repeat yourself, gives stale suggestions, or talks over you. Your autocomplete component is that concierge.
The RADIO Pipeline
Before we dive into each stage, here is the complete flow from keystroke to rendered suggestion:
R — Requirements
Every system design starts here. If you jump to code without nailing requirements, you will build the wrong thing. Split them into functional and non-functional.
Functional Requirements
Core:
- Type-ahead suggestions appear as the user types
- Suggestions update with every meaningful input change (after debounce)
- Recent searches persist across sessions (localStorage)
- Results are grouped by category (products, pages, users, recent searches)
- Keyboard navigation: arrow keys to move, Enter to select, Escape to close
- Clicking or selecting a suggestion navigates to the result
Secondary:
- Clear individual or all recent searches
- Highlight the matched portion of each suggestion
- Show a "no results" state with helpful messaging
- Support for mobile (touch targets, virtual keyboard considerations)
Non-Functional Requirements
| Requirement | Target | Why |
|---|---|---|
| Perceived response time | Under 100ms | Users perceive delays above 100ms as sluggish. Local cache + optimistic UI bridges the gap |
| Debounce interval | 250-300ms | Balance between responsiveness and server load. Too short floods the API, too long feels laggy |
| Accessibility | WCAG 2.1 AA | Combobox ARIA pattern is mandatory. Screen reader must announce suggestion count and selected item |
| Bundle size | Under 5KB gzipped | This loads on every page. Heavy search components destroy LCP |
| Offline resilience | Recent searches work offline | localStorage provides search history without network |
A — Architecture
Component Tree
SearchContainer
├── SearchInput (controlled input + trigger button)
├── SuggestionPanel (dropdown container, ARIA listbox)
│ ├── RecentSearches (localStorage history section)
│ │ └── SuggestionItem[]
│ ├── CategoryGroup (grouped server results)
│ │ ├── CategoryHeader
│ │ └── SuggestionItem[]
│ └── NoResults (empty state)
└── SearchOverlay (backdrop for mobile, click-away close)
Why this structure?
SearchContainerowns all state: query string, suggestions, active index, open/closed. Single source of truth.SuggestionPanelis a pure presentational component. It receives suggestions and renders them. Zero data fetching logic.SuggestionItemis reused across recent searches and server results. Same keyboard navigation, same click handler, same ARIA role.SearchOverlayhandles the "click outside to close" pattern without attaching global event listeners on every render.
The Combobox ARIA Pattern
This is where most implementations fail their accessibility audit. Autocomplete is an ARIA combobox — one of the most complex ARIA patterns. Here is the contract:
// SearchInput
<input
role="combobox"
aria-expanded={isOpen}
aria-controls="suggestion-listbox"
aria-activedescendant={activeId}
aria-autocomplete="list"
aria-haspopup="listbox"
/>
// SuggestionPanel
<ul
id="suggestion-listbox"
role="listbox"
aria-label="Search suggestions"
>
{suggestions.map((item, i) => (
<li
key={item.id}
id={`suggestion-${item.id}`}
role="option"
aria-selected={i === activeIndex}
>
{item.label}
</li>
))}
</ul>
The critical detail: aria-activedescendant on the input points to the ID of the currently highlighted suggestion. This tells screen readers which option is "focused" without actually moving DOM focus away from the input. The user keeps typing while arrows navigate the list — exactly how a sighted user experiences it.
Why aria-activedescendant, not moving focus
Moving focus to each suggestion item with li.focus() would break the typing experience. The user would lose their cursor position in the input field every time they press an arrow key. aria-activedescendant solves this by keeping DOM focus on the input while virtually pointing to the active suggestion. Screen readers like VoiceOver and NVDA announce the active option as if it were focused. This pattern is defined in the WAI-ARIA Combobox specification and is the only correct approach for autocomplete components.
Focus Management
Tab → focuses the search input, opens panel if query exists
Arrow Down → moves to next suggestion (wraps to first)
Arrow Up → moves to previous suggestion (wraps to last)
Enter → selects the active suggestion, closes panel
Escape → clears active suggestion first press, closes panel second press
Home → moves to first suggestion
End → moves to last suggestion
Two-stage Escape is a subtle but important UX pattern. If the user has navigated to suggestion 5 and presses Escape, they probably want to return to the input — not close the entire panel. Only the second Escape dismisses.
D — Data Model
Core Entities
type SuggestionCategory = 'recent' | 'product' | 'page' | 'user'
interface Suggestion {
id: string
label: string
category: SuggestionCategory
url: string
icon?: string
subtitle?: string
matchRanges: Array<{ start: number; length: number }>
}
interface SearchState {
query: string
suggestions: Suggestion[]
activeIndex: number
isOpen: boolean
isLoading: boolean
error: string | null
}
interface SearchHistoryEntry {
query: string
timestamp: number
resultCount: number
}
Why matchRanges instead of pre-formatted HTML?
Because the server should not dictate presentation. matchRanges tells you which characters matched, and the client decides how to highlight them (bold, background color, underline). This keeps the API response format-agnostic and prevents XSS vectors from server-rendered HTML in suggestions.
Result Ranking
Suggestions should be ordered by:
- Exact prefix match (query "rea" → "React" ranks above "Reactive Programming")
- Recency (recent searches appear first in their category)
- Category priority (recent → products → pages → users)
- Popularity (server-side signal, if available)
function rankSuggestions(
serverResults: Suggestion[],
recentSearches: SearchHistoryEntry[],
query: string
): Suggestion[] {
const recent = recentSearches
.filter(entry => entry.query.toLowerCase().startsWith(query.toLowerCase()))
.slice(0, 3)
.map(entry => ({
id: `recent-${entry.query}`,
label: entry.query,
category: 'recent' as const,
url: `/search?q=${encodeURIComponent(entry.query)}`,
matchRanges: [{ start: 0, length: query.length }],
}))
const seen = new Set(recent.map(r => r.label.toLowerCase()))
const deduped = serverResults.filter(
s => !seen.has(s.label.toLowerCase())
)
return [...recent, ...deduped]
}
Never deduplicate suggestions by id alone. A recent search for "react hooks" and a server result for "React Hooks" are the same item to the user but have different IDs. Always normalize by label (case-insensitive) when merging local and server results.
I — Interface (API Design)
The Search Endpoint
GET /api/search/suggest?q=rea&limit=10&categories=product,page,user
Response:
{
"query": "rea",
"suggestions": [
{
"id": "p-react-19",
"label": "React 19 Deep Dive",
"category": "page",
"url": "/courses/react-19-deep-dive",
"subtitle": "Course — 12 modules",
"matchRanges": [{ "start": 0, "length": 3 }]
}
],
"total": 42,
"cached": false
}
Why the response includes query: Race conditions. If the user types "rea", then "react", two requests fire. The "react" response might arrive before the "rea" response. By including the original query in the response, the client can discard stale results — only the response matching the current input gets rendered.
Request Management
Three mechanisms work together to prevent wasted requests:
1. Debounce (300ms)
function useDebounce<T>(value: T, delay: number): T {
const [debouncedValue, setDebouncedValue] = useState(value)
useEffect(() => {
const timer = setTimeout(() => setDebouncedValue(value), delay)
return () => clearTimeout(timer)
}, [value, delay])
return debouncedValue
}
2. Abort previous request
function useSuggestions(query: string) {
const [suggestions, setSuggestions] = useState<Suggestion[]>([])
const abortRef = useRef<AbortController | null>(null)
useEffect(() => {
if (query.length < 2) {
setSuggestions([])
return
}
abortRef.current?.abort()
const controller = new AbortController()
abortRef.current = controller
fetch(`/api/search/suggest?q=${encodeURIComponent(query)}`, {
signal: controller.signal,
})
.then(res => res.json())
.then(data => setSuggestions(data.suggestions))
.catch(err => {
if (err.name !== 'AbortError') throw err
})
return () => controller.abort()
}, [query])
return suggestions
}
3. LRU cache check before fetch
class LRUCache<V> {
private cache = new Map<string, V>()
constructor(private maxSize: number) {}
get(key: string): V | undefined {
const value = this.cache.get(key)
if (value !== undefined) {
this.cache.delete(key)
this.cache.set(key, value)
}
return value
}
set(key: string, value: V): void {
this.cache.delete(key)
if (this.cache.size >= this.maxSize) {
const oldest = this.cache.keys().next().value
if (oldest !== undefined) this.cache.delete(oldest)
}
this.cache.set(key, value)
}
}
The Map insertion order trick is the key insight. Map in JavaScript preserves insertion order. When you get a key, you delete and re-insert it — moving it to the "most recently used" end. The oldest entry is always keys().next().value. This gives you O(1) LRU eviction without a doubly-linked list.
O — Optimizations
This is where good autocomplete becomes great. Every optimization here solves a real problem at scale.
Debounce vs Throttle: Why Debounce Wins
Both limit how often a function fires. The difference matters here:
- Throttle fires at regular intervals (e.g., once per 300ms). The user sees intermediate results while still typing.
- Debounce waits until the user stops typing for 300ms, then fires once.
For autocomplete, debounce wins because:
- Intermediate results are distracting. Showing "r" results, then "re" results, then "rea" results creates visual noise.
- Each intermediate result triggers a re-render and layout recalculation of the suggestion list.
- The user is still typing — they do not care about partial results yet.
- Server load drops dramatically. A throttled search on "react hooks" fires 4-5 requests. Debounced, it fires 1-2.
The one exception: if your search API is fast enough (under 50ms) and you want to show results while the user types (like Google Instant), throttle with a short interval works. But this requires a custom backend optimized for prefix queries, not a general-purpose search API.
Trie for Local Prefix Matching
For recent searches and client-side data, a trie gives O(m) prefix lookup where m is the query length — independent of how many items exist:
class TrieNode {
children = new Map<string, TrieNode>()
items: Suggestion[] = []
}
class PrefixTrie {
private root = new TrieNode()
insert(key: string, item: Suggestion): void {
let node = this.root
for (const char of key.toLowerCase()) {
if (!node.children.has(char)) {
node.children.set(char, new TrieNode())
}
node = node.children.get(char)!
}
node.items.push(item)
}
search(prefix: string): Suggestion[] {
let node = this.root
for (const char of prefix.toLowerCase()) {
const child = node.children.get(char)
if (!child) return []
node = child
}
return this.collectAll(node)
}
private collectAll(node: TrieNode): Suggestion[] {
const results = [...node.items]
for (const child of node.children.values()) {
results.push(...this.collectAll(child))
}
return results
}
}
For a small set of recent searches (under 100 items), Array.filter with startsWith is perfectly fine. The trie becomes valuable when you have thousands of local items — product catalogs, command palette entries, or cached suggestions from previous sessions.
Highlighting Matched Text
Do not use dangerouslySetInnerHTML or regex replacement to bold matched characters. Use the matchRanges from the API to split the label into segments:
function HighlightedText({
text,
ranges,
}: {
text: string
ranges: Array<{ start: number; length: number }>
}) {
if (ranges.length === 0) return <span>{text}</span>
const segments: Array<{ text: string; highlighted: boolean }> = []
let cursor = 0
for (const range of ranges) {
if (cursor < range.start) {
segments.push({
text: text.slice(cursor, range.start),
highlighted: false,
})
}
segments.push({
text: text.slice(range.start, range.start + range.length),
highlighted: true,
})
cursor = range.start + range.length
}
if (cursor < text.length) {
segments.push({ text: text.slice(cursor), highlighted: false })
}
return (
<span>
{segments.map((seg, i) =>
seg.highlighted ? (
<mark key={i}>{seg.text}</mark>
) : (
<span key={i}>{seg.text}</span>
)
)}
</span>
)
}
This approach is XSS-safe (no raw HTML injection), accessible (screen readers handle <mark> correctly), and the server controls what gets highlighted without shipping HTML.
Virtualizing Long Suggestion Lists
If your API returns 50+ suggestions, rendering all of them is wasteful. Only 8-10 are visible at any time. Use a virtualized list to render only visible items:
const ITEM_HEIGHT = 48
const VISIBLE_COUNT = 8
function getVisibleRange(scrollTop: number, totalCount: number) {
const start = Math.floor(scrollTop / ITEM_HEIGHT)
const end = Math.min(start + VISIBLE_COUNT + 2, totalCount)
return { start, end }
}
For most autocomplete implementations with 10-20 results, virtualization is overkill. But for command palettes that search across hundreds of actions (VS Code, Linear, Notion), it is essential.
Preconnect to Search API
If your search API lives on a different origin, add a <link rel="preconnect"> in your document head:
<link rel="preconnect" href="https://search-api.example.com" />
This establishes the TCP connection and TLS handshake before the first search request fires. For a typical HTTPS connection, this saves 100-200ms on the first request — the difference between "fast" and "instant."
Request Deduplication
If the user types "react", deletes the "t", and retypes "t" within the cache TTL, the query "react" fires twice. Deduplicate with the LRU cache:
async function fetchSuggestions(
query: string,
cache: LRUCache<Suggestion[]>,
signal: AbortSignal
): Promise<Suggestion[]> {
const cached = cache.get(query)
if (cached) return cached
const res = await fetch(
`/api/search/suggest?q=${encodeURIComponent(query)}`,
{ signal }
)
const data = await res.json()
cache.set(query, data.suggestions)
return data.suggestions
}
This cache should be per-session (not persisted to localStorage), with a max size of 50-100 entries and time-based invalidation if your data changes frequently.
Putting It All Together
Here is the complete state machine for the autocomplete component:
Common Pitfalls
| What developers do | What they should do |
|---|---|
| Use throttle instead of debounce for search input Throttle fires intermediate requests that produce flickering results. Debounce waits for a natural pause, sending fewer requests with more complete queries. | Use debounce to fire only after the user pauses typing |
| Move DOM focus to suggestion items on arrow key press Moving focus to list items removes the cursor from the input, preventing the user from continuing to type while browsing suggestions. | Keep focus on the input and use aria-activedescendant to indicate the active suggestion |
| Use innerHTML or dangerouslySetInnerHTML to highlight matched text Server-supplied HTML in suggestions is an XSS attack vector. Programmatic splitting is safe by construction. | Split the label string by matchRanges and render segments with mark elements |
| Ignore stale responses that arrive after newer queries Ignoring responses still downloads the full response body and consumes server compute. AbortController actually cancels the connection, saving bandwidth and resources. | Use AbortController to cancel in-flight requests when a new query fires |
| Store the entire search history array in React state Search history is session-persistent data, not UI state. Putting it in React state means it resets on page navigation and duplicates storage logic. | Use localStorage for search history and read it on mount or when the panel opens |
| Debounce with 50ms thinking it improves responsiveness 50ms is shorter than the inter-keystroke gap for most typists, so it fires on nearly every character — defeating the purpose of debouncing entirely. | Use 250-300ms debounce interval |
Essential Rules
- 1Debounce search input at 250-300ms. Shorter defeats the purpose, longer feels laggy.
- 2Abort in-flight requests with AbortController before sending new ones. Never let stale responses race.
- 3Cache responses in an LRU cache (50-100 entries). Same query within seconds should never hit the network twice.
- 4Use the ARIA combobox pattern: role=combobox on input, role=listbox on suggestions, aria-activedescendant for virtual focus.
- 5Keep DOM focus on the input during keyboard navigation. Never move focus to suggestion list items.
- 6Include the original query in API responses so the client can detect and discard stale results from race conditions.
- 7Highlight matched text by splitting the string at matchRanges, not by injecting HTML. No dangerouslySetInnerHTML.
- 8Preconnect to the search API origin to eliminate cold-start latency on the first request.