OpenTelemetry on Frontend

intermediate20 min read

The Observability Gap Between Frontend and Backend

Your backend team has beautiful distributed traces. They can follow a request from the API gateway through three microservices, into the database, and back. They know exactly where time is spent and where bottlenecks live.

Your frontend? A black box. The backend trace starts when the API receives the request. But the user's experience started 500ms earlier — with a click, a state update, a React render, and a fetch call. That 500ms gap is invisible to your backend traces.

OpenTelemetry bridges this gap. By instrumenting the frontend with the same tracing standard your backend uses, you create end-to-end traces that span the entire journey: from user click in the browser, through the network, into your API, through your services, and back to the painted response on screen.

Mental Model

Think of distributed tracing like a package tracking system. When you order something online, you get a tracking number that follows the package from the warehouse, to the sorting facility, to the delivery truck, to your door. Every handoff is recorded. Without frontend tracing, your tracking starts at the sorting facility — you have no idea when the package was actually ordered or how long it sat in the warehouse. Frontend OpenTelemetry gives you the full journey, from click to delivery.

OpenTelemetry Concepts in 60 Seconds

Before writing code, let us nail down four concepts:

Trace — The complete journey of a request. A single trace might span browser, CDN, API, database, and cache.
Span — One unit of work within a trace. "React render CourseList" is a span. "fetch /api/courses" is a span. "PostgreSQL query" is a span. Spans can be nested (a parent span with child spans).
Context — Metadata that travels with the trace. The most important piece is the trace ID — a unique identifier that links all spans in a trace together. When the frontend sends a request to the backend, it propagates the trace ID via HTTP headers so the backend can attach its spans to the same trace.
Exporter — Where traces get sent for visualization. Jaeger, Grafana Tempo, Honeycomb, and Datadog all accept OpenTelemetry data.

Frontend Trace AnatomyPhase 1 / 4

Phase 1 / 4User Click

Root span created when the user triggers an action (click, navigation, form submit)

root spaninteraction

1/4

Setting Up the SDK

The OpenTelemetry JavaScript SDK for browsers uses @opentelemetry/sdk-trace-web as the core tracing package and auto-instrumentation plugins for common browser operations.

pnpm add @opentelemetry/api \
  @opentelemetry/sdk-trace-web \
  @opentelemetry/instrumentation-fetch \
  @opentelemetry/instrumentation-document-load \
  @opentelemetry/instrumentation-user-interaction \
  @opentelemetry/context-zone \
  @opentelemetry/exporter-trace-otlp-http \
  @opentelemetry/resources \
  @opentelemetry/semantic-conventions

// lib/tracing.ts
import { WebTracerProvider } from '@opentelemetry/sdk-trace-web';
import { BatchSpanProcessor } from '@opentelemetry/sdk-trace-web';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { ZoneContextManager } from '@opentelemetry/context-zone';
import { FetchInstrumentation } from '@opentelemetry/instrumentation-fetch';
import { DocumentLoadInstrumentation } from '@opentelemetry/instrumentation-document-load';
import { UserInteractionInstrumentation } from '@opentelemetry/instrumentation-user-interaction';
import { registerInstrumentations } from '@opentelemetry/instrumentation';
import { Resource } from '@opentelemetry/resources';
import {
  ATTR_SERVICE_NAME,
  ATTR_SERVICE_VERSION,
} from '@opentelemetry/semantic-conventions';

const resource = new Resource({
  [ATTR_SERVICE_NAME]: 'learn-infinity-web',
  [ATTR_SERVICE_VERSION]: process.env.NEXT_PUBLIC_APP_VERSION ?? 'dev',
});

const provider = new WebTracerProvider({ resource });

const exporter = new OTLPTraceExporter({
  url: '/api/traces',
});

provider.addSpanProcessor(new BatchSpanProcessor(exporter));

provider.register({
  contextManager: new ZoneContextManager(),
});

registerInstrumentations({
  instrumentations: [
    new FetchInstrumentation({
      propagateTraceHeaderCorsUrls: [/api\.example\.com/],
      clearTimingResources: true,
    }),
    new DocumentLoadInstrumentation(),
    new UserInteractionInstrumentation({
      eventNames: ['click', 'submit'],
    }),
  ],
});

export { provider };

Let us break down the important pieces:

The Provider

WebTracerProvider is the core of the tracing system. It creates spans, manages context, and routes completed spans to processors. The Resource metadata (service name, version) is attached to every span, so when you view traces in Jaeger, you can filter by service.

The Exporter

OTLPTraceExporter sends spans to an OpenTelemetry-compatible collector over HTTP. We point it to /api/traces — a Next.js API route that proxies to your actual collector. This avoids CORS issues and prevents ad blockers from blocking telemetry requests to third-party domains.

The Context Manager

ZoneContextManager uses zone.js to propagate trace context across asynchronous operations. Without it, when a fetch fires inside an async callback, the SDK cannot link it to the parent span because JavaScript's async execution model does not carry context automatically. Zone.js patches async APIs to maintain the context chain.

Quiz

You set up OpenTelemetry with the FetchInstrumentation but notice that fetch requests to your API do not include the traceparent header. Your API is on a different subdomain (api.example.com). What is the issue?

ABCD

Auto-Instrumentation: What You Get for Free

With the three instrumentation plugins registered, you automatically get spans for:

Document Load

Traces the entire page load lifecycle:

Span: documentLoad
  ├── Span: resourceFetch (style.css)       [12ms]
  ├── Span: resourceFetch (main.js)          [45ms]
  ├── Span: resourceFetch (chunk-abc.js)     [38ms]
  └── Span: documentFetch                    [180ms]
      attributes:
        http.url: https://example.com/courses
        document.domContentLoaded: 320ms
        document.load: 890ms

Fetch Requests

Every fetch() call becomes a span with HTTP metadata:

Span: HTTP GET /api/courses
  attributes:
    http.method: GET
    http.url: https://api.example.com/courses
    http.status_code: 200
    http.response_content_length: 14532
    http.duration: 142ms

User Interactions

Click and submit events create spans:

Span: click - button#enroll-btn
  ├── Span: HTTP POST /api/enroll          [230ms]
  └── Span: HTTP GET /api/courses/42       [180ms]

Now you can see that a "click enroll" interaction triggered two API calls and took 410ms total. Without tracing, you would just know "something feels slow."

Custom Spans: Measuring What Matters

Auto-instrumentation captures HTTP and DOM events, but it does not know about your application's internal operations. Custom spans fill that gap.

import { trace } from '@opentelemetry/api';

const tracer = trace.getTracer('learn-infinity-web');

function renderCourseList(courses: Course[]) {
  return tracer.startActiveSpan('renderCourseList', (span) => {
    span.setAttribute('course.count', courses.length);

    const sorted = tracer.startActiveSpan('sortCourses', (sortSpan) => {
      const result = courses.sort((a, b) => b.rating - a.rating);
      sortSpan.end();
      return result;
    });

    const filtered = tracer.startActiveSpan('filterByLevel', (filterSpan) => {
      const result = sorted.filter((c) => c.level === selectedLevel);
      filterSpan.setAttribute('filtered.count', result.length);
      filterSpan.end();
      return result;
    });

    span.end();
    return filtered;
  });
}

This creates a parent span renderCourseList with two child spans: sortCourses and filterByLevel. In your trace viewer, you see exactly how much time each operation takes and what data it processed.

Error Recording in Spans

When an operation fails, record the error on the span:

import { trace, SpanStatusCode } from '@opentelemetry/api';

const tracer = trace.getTracer('learn-infinity-web');

async function enrollInCourse(courseId: string) {
  return tracer.startActiveSpan('enrollInCourse', async (span) => {
    span.setAttribute('course.id', courseId);

    try {
      const response = await fetch(`/api/courses/${courseId}/enroll`, {
        method: 'POST',
      });

      if (!response.ok) {
        span.setStatus({
          code: SpanStatusCode.ERROR,
          message: `Enrollment failed: ${response.status}`,
        });
      }

      span.setAttribute('http.status', response.status);
      return response.json();
    } catch (error) {
      span.setStatus({ code: SpanStatusCode.ERROR });
      span.recordException(error as Error);
      throw error;
    } finally {
      span.end();
    }
  });
}

The recordException method attaches the full error (message, stack trace) to the span. In Jaeger, error spans show up in red, making it easy to find failures in a complex trace.

Quiz

You create a custom span for a data transformation that takes 2ms. In production, this function is called 500 times per page load (once per list item). What is the performance concern?

ABCD

Context Propagation: The W3C Trace Context

Context propagation is what makes distributed tracing work. When the frontend sends a request to the backend, it includes a traceparent header:

traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01

This header contains:

Version — 00 (always)
Trace ID — 4bf92f3577b34da6a3ce929d0e0e4736 (links all spans in the trace)
Parent Span ID — 00f067aa0ba902b7 (the span that made this request)
Trace Flags — 01 (sampled = yes)

When your backend receives this header, it creates its spans with the same trace ID. The result: a single trace that spans browser and server, visible in one timeline.

[Browser]
├── click - button#search
│   └── HTTP GET /api/search?q=react      ─── traceparent ───►
│
[API Gateway]                                                    │
├── handleRequest /api/search                                   │
│   └── searchIndex.query("react")                              │
│       └── PostgreSQL SELECT ... WHERE ...                     │
│
[Browser]
│   └── renderSearchResults (15 items)

Now you can see: the user clicked search, the browser sent a request (50ms network), the API processed it (80ms with 60ms in Postgres), and the browser rendered results (30ms). Total: 160ms. Without the trace, you would just know "search took 160ms" with no idea where the time went.

W3C Trace Context vs B3 headers

The traceparent header follows the W3C Trace Context standard, which is the industry default. Some older systems use B3 headers (from Zipkin), which split the trace ID and span ID into separate headers: X-B3-TraceId, X-B3-SpanId, X-B3-Sampled. If your backend uses B3, you can configure the OpenTelemetry SDK to propagate both formats simultaneously using a CompositePropagator. New systems should always use W3C Trace Context — it is the standard, it is supported by every major tracing backend, and it uses a single header instead of three.

Exporting to Jaeger or Grafana Tempo

Jaeger

Jaeger is a popular open-source tracing backend. For local development, run it with Docker:

docker run -d --name jaeger \
  -p 16686:16686 \
  -p 4318:4318 \
  jaegertracing/all-in-one:latest

Port 16686 is the Jaeger UI. Port 4318 is the OTLP HTTP receiver. Point your exporter to it:

const exporter = new OTLPTraceExporter({
  url: 'http://localhost:4318/v1/traces',
});

Grafana Tempo

Tempo is Grafana's distributed tracing backend, designed to work with Grafana dashboards. It accepts OTLP natively:

const exporter = new OTLPTraceExporter({
  url: 'https://tempo.your-infra.com/v1/traces',
  headers: {
    Authorization: `Bearer ${process.env.NEXT_PUBLIC_TEMPO_TOKEN}`,
  },
});

The Collector Proxy Pattern

In production, do not point the browser exporter directly at Jaeger or Tempo. Use a Next.js API route as a proxy:

// app/api/traces/route.ts
export async function POST(request: Request) {
  const body = await request.arrayBuffer();

  const response = await fetch(process.env.OTEL_COLLECTOR_URL!, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      Authorization: `Bearer ${process.env.OTEL_COLLECTOR_TOKEN}`,
    },
    body,
  });

  return new Response(null, { status: response.status });
}

This pattern has three benefits: it hides the collector URL from the browser, it prevents ad blockers from blocking telemetry (same-origin requests are not blocked), and it lets you add authentication without exposing tokens in client-side code.

Quiz

You deploy OpenTelemetry on your frontend with a tracesSampleRate of 1.0 (100% sampling). Your site handles 10 million page views per day. What problem will you encounter?

ABCD

Sampling Strategies

Not every page load needs to be traced. Sampling controls how many traces you capture.

Head-Based Sampling

The sampling decision is made when the trace starts (at the root span). Simple and predictable:

import { TraceIdRatioBasedSampler } from '@opentelemetry/sdk-trace-web';

const provider = new WebTracerProvider({
  resource,
  sampler: new TraceIdRatioBasedSampler(0.05),
});

This samples 5% of traces. The decision propagates to all child spans and to the backend (via the trace flags in the traceparent header). If the frontend decides not to sample, the backend skips tracing for that request too.

Conditional Sampling

Sometimes you want to always trace certain operations regardless of the sample rate:

import { Sampler, SamplingResult, SamplingDecision } from '@opentelemetry/sdk-trace-web';

class ConditionalSampler implements Sampler {
  shouldSample(context: unknown, traceId: string, name: string): SamplingResult {
    if (name.includes('checkout') || name.includes('enrollment')) {
      return { decision: SamplingDecision.RECORD_AND_SAMPLED };
    }

    const hash = parseInt(traceId.substring(0, 8), 16);
    const shouldSample = hash % 100 < 5;

    return {
      decision: shouldSample
        ? SamplingDecision.RECORD_AND_SAMPLED
        : SamplingDecision.NOT_RECORD,
    };
  }

  toString() {
    return 'ConditionalSampler';
  }
}

This always traces checkout and enrollment flows (critical paths) while sampling 5% of everything else.

Key Rules

1Use 1-10% sampling for production traffic — full sampling overwhelms storage and adds unnecessary cost
2Always trace critical user journeys (checkout, enrollment, payment) regardless of sample rate
3Use head-based sampling so the decision propagates to all services — partial traces are useless
4Proxy telemetry through your own API route to avoid ad blockers and hide collector credentials

What developers do	What they should do
Sending traces directly from the browser to an external collector Direct requests to third-party collector domains are blocked by ad blockers, leak the collector URL to the browser, and require exposing auth tokens in client code.	Proxy through a same-origin API route
Creating spans for every tiny operation (each list item render, each CSS class toggle) Excessive spans add overhead to the browser and flood the trace viewer. A trace with 1000 spans is unreadable. Focus on spans that represent user-facing operations.	Create spans for meaningful operations: fetch requests, component mounts, user interactions, data transformations
Not configuring propagateTraceHeaderCorsUrls for cross-origin API calls Without it, the SDK will not add the traceparent header to cross-origin requests, breaking the connection between frontend and backend traces.	Set the regex to match all your API domains
Forgetting to call span.end() in error paths An un-ended span leaks memory and is never exported. The trace viewer shows it as an incomplete operation.	Use try/finally to ensure span.end() is always called

The Full Picture

OpenTelemetry on the frontend is not about replacing your existing monitoring — it is about completing the picture. Sentry catches errors. Web Vitals measure page performance. OpenTelemetry shows you the full journey of every user interaction, from click to response, across every service boundary.

The magic moment is when a user reports "the search is slow" and you open a single trace that shows: 50ms in the browser preparing the request, 30ms network to the API, 400ms in Elasticsearch, 20ms serializing the response, and 80ms rendering results. The bottleneck is obvious. The fix is targeted. No guessing, no "let me add some console.logs and try to reproduce it."

That is observability. Not just knowing something is broken, but seeing exactly where and why — across the entire stack, in a single view.