Skip to content

Lab Data vs Field Data

advanced18 min read

Your Lighthouse Score Is Not Your Users' Experience

You run Lighthouse, see a green 95 performance score, ship with confidence. Two weeks later, your CrUX data shows half your users have LCP over 4 seconds. What happened?

Nothing went wrong with the tool. The problem is that Lighthouse and your users are measuring completely different realities. Lighthouse runs on your fast MacBook, on a stable Wi-Fi connection, with a clean browser profile. Your actual users are on a 2019 Android phone in Jakarta, riding the subway with spotty 4G, with 47 browser tabs open.

This gap between lab data and field data is one of the most misunderstood concepts in web performance. Understanding it is the difference between optimizing for benchmarks and optimizing for real humans.

Mental Model

Lab data is like testing a car on a perfectly smooth track at a controlled temperature. Field data is like tracking how that car performs across millions of drivers on real roads — potholes, rain, traffic, altitude changes. Both are useful. Neither alone tells the full story. The track gives you repeatable, debuggable results. The road gives you the truth.

Lab Data: The Controlled Experiment

Lab data comes from running performance tools in a controlled, synthetic environment. You choose the device profile, network speed, and conditions. The same test run twice produces nearly identical results.

Common Lab Tools

  • Lighthouse — Built into Chrome DevTools. Simulates a mid-tier mobile device with throttled CPU (4x slowdown) and network (simulated slow 4G). Runs locally in your browser
  • WebPageTest — Remote testing from real locations worldwide. Supports real device emulation, multi-step scripting, video comparison, and waterfall analysis. The gold standard for deep lab analysis
  • Chrome DevTools Performance panel — Records runtime performance traces. Shows flame charts, main thread blocking, layout shifts, and long tasks. No throttling by default (you must enable it manually)
  • PageSpeed Insights — Runs Lighthouse remotely on Google's servers AND shows real CrUX field data. The bridge between lab and field

What Lab Data Does Well

Lab data shines in three areas:

  1. Reproducibility — Same test, same conditions, same results. You can A/B test code changes with confidence that differences come from your code, not from network variability
  2. Debugging — Waterfall charts, flame charts, and frame-by-frame rendering timelines let you pinpoint exactly what is slow and why. Field data tells you that something is slow. Lab data tells you why
  3. Pre-production testing — You can test a staging deploy before real users ever see it. Field data requires real traffic

Where Lab Data Fails

Here is the uncomfortable truth: lab data is a fiction. A useful fiction, but a fiction nonetheless.

  • Throttling is not real slowness. Lighthouse simulates a slow network by adding delays. A real slow 3G connection has packet loss, jitter, variable latency, and TCP retransmissions that throttling cannot replicate. Simulated 4G behaves nothing like actual 4G on a crowded cell tower
  • CPU throttling is crude. Lighthouse applies a multiplier to slow down JavaScript execution. But real low-end devices have smaller caches, weaker GPUs, thermal throttling, and background processes competing for resources. A 4x CPU slowdown on an M3 MacBook does not equal a Snapdragon 665
  • No real user diversity. Lab tests use one device profile, one viewport, one network. Your real audience has thousands of combinations. The person on a Jio network in rural India has a fundamentally different experience than someone on fiber in Seoul
  • No interaction patterns. Lighthouse measures load performance. It does not measure what happens when the user scrolls, clicks, types, or navigates between pages. INP (Interaction to Next Paint) can only be measured meaningfully with real interactions
Quiz
A developer's Lighthouse score shows LCP of 1.2 seconds, but CrUX data reports LCP at 3.8 seconds for the same page. What is the most likely explanation?

Field Data: The Ground Truth

Field data (also called Real User Monitoring, or RUM) captures performance metrics from actual users visiting your site. Every page load, every interaction, every layout shift — measured on their real device, their real network, in their real context.

Sources of Field Data

Chrome User Experience Report (CrUX) is the largest public dataset of real-user performance data. Chrome collects anonymized metrics from users who have opted into usage statistics syncing. Key facts:

  • Covers millions of websites, updated monthly (with a 28-day rolling window)
  • Reports Core Web Vitals: LCP, INP, CLS
  • Provides origin-level and URL-level data
  • Accessible via PageSpeed Insights, BigQuery, and the CrUX API
  • Used by Google for Search ranking signals

The web-vitals JavaScript library lets you collect field data from your own users. It hooks into browser Performance APIs to measure LCP, INP, CLS, FCP, and TTFB, then gives you a callback to send that data wherever you want.

Commercial RUM tools (Datadog RUM, New Relic Browser, SpeedCurve, Sentry Performance) provide dashboards, alerting, and deep segmentation on top of raw field data.

What Field Data Does Well

  1. Truth — This is what your users actually experience. No simulation, no throttling, no guessing
  2. Distribution visibility — You see the full range: the fast users, the slow users, the median, the long tail. Lab data gives you one data point. Field data gives you millions
  3. Impact on business — You can correlate real performance with real conversion rates, bounce rates, and engagement. "Users with LCP under 2.5s have 23% higher conversion" is a field-data insight
  4. INP measurement — Interaction to Next Paint requires real user interactions. You cannot meaningfully measure INP in a lab test (Lighthouse reports TBT as a proxy, but it is not the same thing)

Where Field Data Falls Short

  • No debugging capability. Field data tells you LCP is 4.2 seconds at the 75th percentile. It does not tell you why. Was it a slow server response? A render-blocking stylesheet? A massive hero image? You need lab tools to diagnose
  • Requires real traffic. New pages, pre-launch features, and staging environments have no field data. You are flying blind until real users visit
  • Delayed feedback. CrUX updates monthly. Even your own RUM data requires enough samples to be statistically meaningful. Lab data gives you feedback in seconds
  • Privacy constraints. You cannot collect identifying information about individual user sessions (nor should you). This limits how deep you can segment
Quiz
You need to diagnose why INP is failing on a specific page. Which approach gives you the most actionable debugging information?

Why Lab and Field Disagree

The disagreement is not random. There are specific, predictable reasons why lab and field numbers diverge.

DimensionLab DataField Data
DeviceSimulated mid-tier phone (CPU throttling)Real devices: flagship phones, budget Androids, old iPhones, tablets, desktops
NetworkSimulated throttled connection (fixed latency + bandwidth)Real networks: 3G, 4G, 5G, Wi-Fi, satellite, with jitter and packet loss
User behaviorCold load only, no interactionScroll, click, type, navigate back, switch tabs, multi-step flows
GeographySingle test locationGlobal distribution across all regions and ISPs
Browser stateClean profile, no extensionsDozens of extensions, cached resources, background tabs
Sample size1 test run (or a few)Thousands to millions of real page loads
TimingSnapshot at test time28-day rolling window (CrUX) or continuous (RUM)
Metrics availableLCP, CLS, TBT, FCP, SI, TTFBLCP, INP, CLS, FCP, TTFB (real interactions)
DebuggingFull waterfall, flame chart, frame timelineAggregate numbers only (unless using attribution builds)
Speed of feedbackSecondsDays to weeks for statistical significance

The Throttling Problem

Lighthouse applies simulated throttling by default. It adds artificial delays to network requests after they have already completed at full speed. This is fast to run but misses real-world effects like TCP slow start, connection saturation, and resource contention.

WebPageTest supports applied throttling (also called packet-level throttling), which actually shapes traffic at the network layer. This is more realistic but still cannot replicate the behavior of a real congested mobile network where a cell tower is shared with thousands of other users.

Neither approach can simulate:

  • DNS resolution variability across ISPs
  • CDN edge cache misses for cold regions
  • TLS handshake overhead on slow CPUs
  • TCP congestion window resets from packet loss
  • Background app interference on mobile devices

The Device Problem

A 4x CPU slowdown on an Apple M3 chip does not produce the same behavior as a MediaTek Dimensity 700 running natively. The throttled M3 still has:

  • Larger L1/L2/L3 caches
  • Faster memory bandwidth
  • Better branch prediction
  • No thermal throttling (your MacBook has a fan, that budget phone does not)
  • No competition from other apps for RAM

The result: JavaScript that runs fine under simulated throttling can cause multi-second jank on real budget devices because the bottleneck is not raw CPU speed but memory pressure and thermal constraints.

Quiz
Lighthouse uses 'simulated throttling' by default. What does this actually mean?

Percentiles: P75 vs P95

When you look at field data, you are looking at a distribution, not a single number. The choice of which percentile to report changes the story completely.

P75 (75th percentile) means 75% of your users had an experience at or better than this value. Google uses P75 for Core Web Vitals thresholds and Search ranking. If your P75 LCP is 2.4 seconds, that means 75% of your users saw LCP in 2.4 seconds or less.

P95 (95th percentile) captures the experience of your worst-off users (excluding extreme outliers). If your P95 LCP is 8 seconds, that means 5% of your users — potentially millions of people at scale — waited 8 seconds or more for meaningful content.

Why P75?

Google chose P75 as the threshold for Core Web Vitals because:

  • It is high enough to represent real pain points (not just the median, which hides the long tail)
  • It is not so extreme that a few outliers (bots, broken connections, users closing tabs) dominate the metric
  • It balances actionability with sensitivity — improvements at P75 are achievable and meaningful

When P95 Matters

P75 is where Google draws the line. P95 is where your worst user experiences live. Consider:

  • A site with 1 million daily page loads at P95 LCP of 8s means 50,000 page loads per day are painfully slow
  • These slow loads disproportionately affect users in developing markets — exactly the audience many companies are trying to grow
  • Conversion rate impact at the tail is often more severe than at the median

If you only optimize for P75, you are explicitly choosing to ignore the bottom 25% of your user base.

P75 is a floor, not a ceiling

Passing Core Web Vitals at P75 is the minimum bar for Search ranking benefits. If you are serious about performance, track P95 and P99 as well. The long tail is where your most frustrated users live.

The CrUX Dataset

The Chrome User Experience Report is the canonical source of field data for the public web. Understanding how it works — and its limitations — is essential.

How CrUX Collects Data

CrUX data comes from real Chrome users who meet these criteria:

  • Using Chrome on Android, ChromeOS, Linux, macOS, or Windows (not iOS — Chrome on iOS uses WebKit, not Blink)
  • Have usage statistic reporting enabled (opted in)
  • Have synced their browsing history

This means CrUX data skews toward Chrome users and excludes Safari (iOS), Firefox, and other browsers entirely. For sites with heavy iOS traffic, CrUX may not represent the full audience.

CrUX Data Access Points

PageSpeed Insights (PSI) — The easiest way to check CrUX data for any URL. Enter a URL and you get both lab results (Lighthouse) and field data (CrUX) side by side. This is the first place to check.

CrUX API — Programmatic access to origin-level and URL-level CrUX data. Free, requires an API key. Returns P75 values and histogram distributions for Core Web Vitals.

BigQuery — The full CrUX dataset, updated monthly. Lets you run SQL queries across the entire dataset — compare sites, analyze trends, segment by connection type, device type, and country. Incredibly powerful for competitive analysis.

-- Example: Query LCP distribution for a specific origin
SELECT
  origin,
  effective_connection_type.name AS connection_type,
  largest_contentful_paint.histogram AS lcp_histogram,
  largest_contentful_paint.percentiles.p75 AS lcp_p75
FROM
  `chrome-ux-report.all.202403`
WHERE
  origin = 'https://example.com'

CrUX Dashboard — A Google Data Studio template that auto-generates trend charts from CrUX BigQuery data. Plug in your origin and get historical trends without writing SQL.

Quiz
A site has heavy traffic from Safari on iOS. A developer checks CrUX and sees great Core Web Vitals scores. Should they be confident in their performance?

Implementing RUM with the web-vitals Library

The web-vitals library is a tiny (under 2KB gzipped) library maintained by the Chrome team. It provides reliable, accurate measurements of all Core Web Vitals using the same underlying browser APIs that CrUX uses.

Basic Setup

import { onLCP, onINP, onCLS, onFCP, onTTFB } from 'web-vitals';

function sendToAnalytics(metric) {
  const body = JSON.stringify({
    name: metric.name,
    value: metric.value,
    rating: metric.rating,
    delta: metric.delta,
    id: metric.id,
    navigationType: metric.navigationType,
  });

  if (navigator.sendBeacon) {
    navigator.sendBeacon('/api/vitals', body);
  } else {
    fetch('/api/vitals', { body, method: 'POST', keepalive: true });
  }
}

onLCP(sendToAnalytics);
onINP(sendToAnalytics);
onCLS(sendToAnalytics);
onFCP(sendToAnalytics);
onTTFB(sendToAnalytics);

A few critical details in this code:

  • navigator.sendBeacon is essential. Unlike fetch, sendBeacon is guaranteed to deliver data even when the user navigates away or closes the tab. Metrics like CLS and INP report their final values on page unload — if you use fetch without keepalive: true, the request may be canceled
  • metric.delta gives you the change since the last report, not the cumulative value. CLS reports multiple times as shifts occur. Use delta if you are summing values server-side
  • metric.rating is "good", "needs-improvement", or "poor" based on Core Web Vitals thresholds

Attribution Build for Debugging

The standard web-vitals build tells you what the metric value is. The attribution build tells you why.

import { onINP } from 'web-vitals/attribution';

onINP((metric) => {
  const attribution = metric.attribution;

  console.log('Slow interaction:', {
    eventTarget: attribution.interactionTarget,
    eventType: attribution.interactionType,
    inputDelay: attribution.inputDelay,
    processingDuration: attribution.processingDuration,
    presentationDelay: attribution.presentationDelay,
    longAnimationFrameEntries: attribution.longAnimationFrameEntries,
  });
});

The attribution build is larger (around 4KB gzipped) but invaluable for debugging. It breaks down INP into its three phases:

  • Input delay — Time from user interaction to when the event handler starts. Usually caused by long tasks blocking the main thread
  • Processing duration — Time spent in event handlers. Your code's fault
  • Presentation delay — Time from handler completion to next paint. Usually layout/style recalculation or rendering work

Using PerformanceObserver Directly

For custom metrics beyond Core Web Vitals, you can use the PerformanceObserver API directly.

const observer = new PerformanceObserver((list) => {
  for (const entry of list.getEntries()) {
    if (entry.entryType === 'largest-contentful-paint') {
      console.log('LCP candidate:', {
        element: entry.element,
        url: entry.url,
        startTime: entry.startTime,
        size: entry.size,
        renderTime: entry.renderTime,
        loadTime: entry.loadTime,
      });
    }
  }
});

observer.observe({ type: 'largest-contentful-paint', buffered: true });

The buffered: true option is critical — without it, you miss entries that occurred before the observer was registered. Since scripts typically load after the page has started rendering, many LCP candidates would be lost without buffering.

Why sendBeacon, not fetch?

When a user navigates away from your page, the browser cancels pending fetch requests. This is a problem because CLS and INP finalize their values on page visibility change or unload. If you collect metrics using fetch without keepalive: true, you lose the most important data point — the final metric value.

navigator.sendBeacon solves this. It is designed for "fire-and-forget" requests that must survive page unload. The browser queues the request and guarantees delivery even after the page is gone. The tradeoff: you cannot read the response, and the payload is limited (typically 64KB). For performance telemetry, this is exactly what you want.

If sendBeacon is unavailable, fetch with keepalive: true is the fallback. The keepalive flag tells the browser to keep the request alive even after the page unloads, up to a cumulative 64KB limit across all keepalive requests.

When to Use Each

This is not an either-or decision. Lab and field data serve fundamentally different purposes, and a mature performance practice uses both.

ScenarioUse Lab DataUse Field Data
Debugging a specific performance bottleneckYes — waterfall, flame chart, frame analysisNo — too aggregate for root cause analysis
Measuring real-world user impactNo — conditions are syntheticYes — this IS the real-world impact
Pre-launch performance testingYes — no real users yetNo — no traffic to measure
Setting performance budgetsYes — reproducible, automatable in CIYes — validate budgets against real conditions
Tracking performance regressions over timeYes — consistent baseline for comparisonYes — catches regressions that lab tests miss
Optimizing for a specific market (India, Brazil, Nigeria)Partially — WebPageTest can test from those locationsYes — CrUX and RUM show actual user experience there
Measuring INP (Interaction to Next Paint)No — requires real user interactionsYes — the only way to measure real INP
Competitive benchmarkingYes — WebPageTest for controlled comparisonYes — CrUX BigQuery for real-world comparison
CI/CD pipeline gatesYes — Lighthouse CI can block deploysNo — too slow for build pipelines

The Ideal Workflow

  1. Develop — Use DevTools Performance panel to profile as you build. Catch obvious issues early
  2. Pre-merge — Run Lighthouse CI in your CI/CD pipeline. Set performance budgets. Block merges that regress key metrics
  3. Post-deploy — Monitor field data via RUM (web-vitals library + your analytics backend). Watch for regressions that lab tests missed
  4. Investigate — When field data shows a regression, use WebPageTest and DevTools to reproduce and diagnose
  5. Validate — After fixing, confirm improvement in both lab (immediate feedback) and field (delayed but authoritative) data
Quiz
Your CI pipeline runs Lighthouse on every pull request and the scores are consistently green. After deploying, your RUM data shows INP failures on the checkout page. Why did CI not catch this?

Building a Complete Performance Monitoring Stack

A production-grade performance monitoring setup combines lab and field data into a single workflow.

Layer 1: Lab (Development + CI)

  • Chrome DevTools Performance panel during development
  • Lighthouse CI in your CI/CD pipeline with performance budgets
  • WebPageTest for deep-dive investigations and competitor analysis
  • Custom DevTools recordings for specific user flows

Layer 2: Field (Production)

  • web-vitals library collecting Core Web Vitals from all users
  • Attribution build enabled for a sample of traffic (not all — it adds bundle size)
  • Data pipeline to your analytics backend (BigQuery, Datadog, custom)
  • Dashboards segmented by device type, connection speed, geography, and page

Layer 3: Alerting

  • Alert on P75 regressions (Core Web Vitals threshold crossings)
  • Alert on P95 regressions (tail performance degradation)
  • Alert on lab regression (Lighthouse CI budget failures)
  • Weekly reports comparing lab trends vs field trends
Key Rules
  1. 1Lab data is for debugging and prevention. Field data is for truth and validation. Use both.
  2. 2Lighthouse simulated throttling does not replicate real network conditions — never trust a lab score as your users' reality.
  3. 3CrUX only includes Chrome users with sync enabled — supplement with your own RUM for full browser coverage.
  4. 4Always use navigator.sendBeacon or fetch with keepalive for metric collection — regular fetch loses data on page unload.
  5. 5P75 is Google's threshold, but P95 reveals your worst user experiences. Track both.
  6. 6The web-vitals attribution build is essential for diagnosing WHY a metric is slow, not just that it is slow.
  7. 7INP cannot be meaningfully measured in lab conditions — it requires real user interactions in the field.
What developers doWhat they should do
Treating Lighthouse score as the definitive measure of site performance
Lighthouse runs in synthetic conditions that do not represent the diversity of real user devices, networks, and behaviors
Use Lighthouse for debugging and CI gates, but validate with CrUX and RUM for real-world performance
Only tracking P75 because that is what Google uses for ranking
P75 passing means 25% of users may still have a poor experience. At scale, that is millions of bad page loads
Track P75, P95, and P99 to understand the full distribution of user experience
Using fetch without keepalive to send performance metrics
CLS and INP report final values on page unload. Regular fetch requests are canceled when the user navigates away, losing the most critical data
Use navigator.sendBeacon or fetch with keepalive: true
Assuming CrUX data represents all your users
CrUX only includes opted-in Chrome users. Safari (iOS), Firefox, and other browsers are excluded entirely
Supplement CrUX with your own RUM implementation that covers all browsers
Running Lighthouse in CI and assuming INP is covered because TBT passes
TBT is a lab proxy that only measures main-thread blocking during load. It cannot capture real user interaction responsiveness
Implement field-based INP monitoring with the web-vitals library
Quiz
You are setting up performance monitoring for a new production site. Which combination gives you the most complete picture?