Tool Calls and Function Calling UI

advanced22 min read

The Chatbot That Could Actually Do Things

You build a chatbot. It can answer questions, summarize text, write code. Impressive for about five minutes. Then your user types "What's the weather in Tokyo?" and the model confidently hallucinates a number. It has no idea what the weather is. It's a text generator, not a weather station.

Tool calling changes everything. Instead of guessing, the model says: "I need to call a weather API with the argument Tokyo." Your code executes that API call, sends the result back, and the model responds with actual, real-time data. The LLM didn't execute anything. It just asked, politely, in structured JSON.

This is how you go from chatbot to agent. And building the frontend for this requires understanding a loop that most tutorials gloss over.

The Mental Model

Mental Model

Think of the LLM as a brilliant strategist who can't use their hands. They can look at a situation, decide exactly what needs to happen, and write down precise instructions on a card: "Call this function with these arguments." But they can't pick up the phone themselves. You (the frontend/backend code) are the hands. You read the card, execute the action, write the result on a new card, and slide it back. The strategist reads the result and decides what to say next -- or writes another instruction card. That's the entire tool calling loop.

The Tool Calling Loop

Every tool call follows the same multi-turn dance, regardless of which LLM provider you use:

Tool Call Execution LoopPhase 1 / 5

Phase 1 / 5User Message

User sends a prompt that requires external data or action

1/5

The critical insight: the LLM never executes tools directly. It outputs a structured request (tool name + arguments as JSON), your code runs it, and you feed the result back. The model is a decision engine, not an execution engine.

Let's trace through a concrete example:

Execution Trace

User sends message

"What's the weather in Tokyo and should I bring an umbrella?"

Requires real-time data the model doesn't have

LLM returns tool_use

{ name: 'get_weather', args: { city: 'Tokyo' } }

Model chose the tool and extracted the city argument from natural language

Your code executes

fetch('api.weather.com/Tokyo') returns { temp: 18, condition: 'rain' }

Actual HTTP call happens in your code, not inside the model

Send tool_result

{ tool_call_id: 'xyz', result: { temp: 18, condition: 'rain' } }

Result goes back into the conversation as a tool_result message

LLM generates response

"It's 18C and raining in Tokyo. Definitely bring an umbrella!"

Model now has real data and can answer accurately

Quiz

In the tool calling loop, who actually executes the tool function?

ABCD

Defining Tools: The Schema Contract

Before the LLM can call tools, you need to tell it what tools exist. This is done through JSON Schema definitions that describe each tool's name, purpose, and parameter types.

Here's what a tool definition looks like across the major providers:

const weatherTool = {
  name: "get_weather",
  description: "Get current weather for a city. Use when the user asks about weather conditions, temperature, or forecasts.",
  parameters: {
    type: "object",
    properties: {
      city: {
        type: "string",
        description: "City name, e.g. 'Tokyo' or 'San Francisco'"
      },
      units: {
        type: "string",
        enum: ["celsius", "fahrenheit"],
        description: "Temperature unit preference"
      }
    },
    required: ["city"]
  }
};

With Zod (the TypeScript way), you get runtime validation for free:

import { z } from "zod";

const WeatherParams = z.object({
  city: z.string().describe("City name"),
  units: z.enum(["celsius", "fahrenheit"]).default("celsius")
});

type WeatherParams = z.infer<typeof WeatherParams>;

The schema serves double duty: it tells the LLM what arguments to produce, and it gives you a validation contract for the arguments you receive. Never trust the model's output without validation -- it can produce malformed JSON, missing fields, or hallucinated parameter values.

Common Trap

Tool descriptions matter more than you think. The LLM uses the description to decide when to call a tool, not just how. A vague description like "gets data" means the model will either never call it or call it at the wrong time. Be specific: "Get current weather conditions for a city. Returns temperature, humidity, and precipitation. Use when the user asks about weather, temperature, or whether to bring a jacket."

Provider Formats: Anthropic vs OpenAI

The tool calling loop is universal, but the wire format differs between providers. Here's what the actual messages look like:

Anthropic (Claude)

// Tool call from Claude — a content block with type "tool_use"
{
  role: "assistant",
  content: [
    { type: "text", text: "Let me check the weather for you." },
    {
      type: "tool_use",
      id: "toolu_01ABC123",
      name: "get_weather",
      input: { city: "Tokyo", units: "celsius" }
    }
  ]
}

// Your tool result — sent as a user message with tool_result content
{
  role: "user",
  content: [
    {
      type: "tool_result",
      tool_use_id: "toolu_01ABC123",
      content: JSON.stringify({ temp: 18, condition: "rain" })
    }
  ]
}

OpenAI

// Tool call from GPT — tool_calls array on the assistant message
{
  role: "assistant",
  tool_calls: [
    {
      id: "call_abc123",
      type: "function",
      function: {
        name: "get_weather",
        arguments: '{"city":"Tokyo","units":"celsius"}'
      }
    }
  ]
}

// Your tool result — a message with role "tool"
{
  role: "tool",
  tool_call_id: "call_abc123",
  content: JSON.stringify({ temp: 18, condition: "rain" })
}

Notice the differences: Anthropic uses tool_use content blocks with an input object, while OpenAI uses a tool_calls array where arguments are a JSON string (yes, a string -- you need to JSON.parse it). Anthropic sends tool results as user messages with tool_result content blocks; OpenAI uses a dedicated tool role.

Quiz

In the Anthropic API format, how is a tool result sent back to the model?

ABCD

Streaming Tool Inputs

Here's where it gets interesting for frontend developers. When the model streams a response, tool call arguments don't arrive all at once -- they arrive as partial JSON chunks:

chunk 1: {"city":
chunk 2: "Tok
chunk 3: yo","un
chunk 4: its":"cel
chunk 5: sius"}

You need to accumulate these chunks into valid JSON before you can execute the tool. The Vercel AI SDK handles this with specific stream events:

// Vercel AI SDK stream events for tool calls
for await (const event of stream) {
  switch (event.type) {
    case "tool-input-start":
      // A new tool call is beginning
      // event.toolName: "get_weather"
      // event.toolCallId: "call_123"
      break;

    case "tool-input-delta":
      // Partial JSON chunk for the tool's arguments
      // event.delta: '{"city":"Tok'
      // Accumulate this into a buffer
      break;

    case "tool-input-available":
      // All argument chunks received — full JSON is now parseable
      // event.args: { city: "Tokyo", units: "celsius" }
      // NOW you can execute the tool
      break;

    case "tool-output-available":
      // Tool execution complete, result available
      // event.output: { temp: 18, condition: "rain" }
      break;
  }
}

The key states for your UI:

tool-input-start -- show "Deciding which tool to use..." or the tool name
tool-input-delta -- optionally show the arguments building up (great for transparency)
tool-input-available -- show "Running get_weather..." and actually execute
tool-output-available -- show the result inline

Info

You cannot execute a tool until all input chunks have arrived. Attempting to parse partial JSON will throw. Accumulate deltas into a string buffer and only parse when you receive the tool-input-available event (or equivalent completion signal from your provider).

Building the Tool Execution UI

This is where most AI chat interfaces fail. They either hide tool calls entirely (users don't trust the response) or dump raw JSON (users are overwhelmed). The sweet spot is a purpose-built UI that shows what's happening without requiring the user to understand the mechanics.

The State Machine

Every tool call goes through predictable states. Model this explicitly:

type ToolCallState =
  | { status: "pending"; toolName: string }
  | { status: "streaming_args"; toolName: string; partialArgs: string }
  | { status: "executing"; toolName: string; args: Record<string, unknown> }
  | { status: "complete"; toolName: string; result: unknown }
  | { status: "error"; toolName: string; error: string };

Each state maps to a distinct UI treatment:

function ToolCallCard({ state }: { state: ToolCallState }) {
  switch (state.status) {
    case "pending":
      return <ToolPending name={state.toolName} />;

    case "streaming_args":
      return <ToolStreamingArgs name={state.toolName} partial={state.partialArgs} />;

    case "executing":
      return <ToolExecuting name={state.toolName} args={state.args} />;

    case "complete":
      return <ToolResult name={state.toolName} result={state.result} />;

    case "error":
      return <ToolError name={state.toolName} error={state.error} />;
  }
}

Visual Patterns That Work

Here's what the best AI products do:

Inline expansion -- The tool call appears as a collapsible card within the chat stream. Collapsed shows: tool icon + name + status. Expanded shows: arguments sent, result received, execution time.

Progressive disclosure -- Users who want transparency can expand. Users who just want the answer see a minimal indicator that the model "looked something up."

Contextual icons -- Different tool types get different icons: a globe for web searches, a database cylinder for queries, a code bracket for code execution, a chart for data analysis. This instantly communicates what happened.

function ToolIcon({ toolName }: { toolName: string }) {
  const icons: Record<string, string> = {
    web_search: "search",
    query_database: "database",
    run_code: "code",
    get_weather: "cloud",
    send_email: "mail",
  };
  return <Icon name={icons[toolName] ?? "tool"} />;
}

Quiz

Why should tool execution be modeled as an explicit state machine rather than simple boolean flags like isLoading and isError?

ABCD

Rendering Different Tool Types

Not all tool results should be rendered the same way. A weather result should look different from a code execution result or a database query result.

function ToolResultRenderer({ toolName, result }: {
  toolName: string;
  result: unknown;
}) {
  switch (toolName) {
    case "get_weather":
      return <WeatherCard data={result as WeatherData} />;

    case "run_code":
      return <CodeOutput output={result as CodeResult} />;

    case "search_web":
      return <SearchResults results={result as SearchResult[]} />;

    case "query_database":
      return <DataTable rows={result as Record<string, unknown>[]} />;

    default:
      return <JsonPreview data={result} />;
  }
}

The fallback is important. You'll add new tools over time, and a generic JSON preview ensures nothing breaks when a tool result doesn't have a custom renderer yet.

Rich Tool Results

The best implementations render tool results as first-class UI elements, not just text. When the model calls a search_products tool, you don't show [{name: "Widget", price: 29.99}]. You show actual product cards with images, prices, and "Add to cart" buttons. The tool result becomes interactive UI.

function ProductSearchResult({ products }: { products: Product[] }) {
  return (
    <div className="grid grid-cols-2 gap-3">
      {products.map(product => (
        <ProductCard
          key={product.id}
          name={product.name}
          price={product.price}
          image={product.image}
          onAddToCart={() => addToCart(product.id)}
        />
      ))}
    </div>
  );
}

The model provides the data. Your UI provides the experience.

Error Handling: When Tools Fail

Tools fail. APIs time out, rate limits hit, permissions are denied, arguments are invalid. How you handle this determines whether your AI experience feels robust or fragile.

The Error Loop

When a tool fails, you have a choice: tell the model about the failure (so it can adapt), or silently retry. Usually, telling the model is better:

async function executeToolCall(
  toolName: string,
  args: Record<string, unknown>,
  toolCallId: string
): Promise<ToolResultMessage> {
  try {
    const result = await toolRegistry[toolName](args);
    return {
      tool_use_id: toolCallId,
      content: JSON.stringify(result),
    };
  } catch (error) {
    return {
      tool_use_id: toolCallId,
      content: JSON.stringify({
        error: true,
        message: error instanceof Error ? error.message : "Tool execution failed",
      }),
      is_error: true,
    };
  }
}

When the model receives an error result, it typically does one of three things:

Retries with different arguments -- "The city name was ambiguous, let me try 'Tokyo, Japan'"
Falls back to a different tool -- "Weather API is down, let me search the web instead"
Tells the user -- "I wasn't able to fetch the weather. The service might be temporarily unavailable."

This is significantly better than your code silently retrying in a loop. The model can make intelligent decisions about how to recover.

Timeouts

Always set timeouts on tool execution. A tool that hangs for 30 seconds destroys the user experience:

async function executeWithTimeout(
  fn: () => Promise<unknown>,
  timeoutMs: number = 10000
): Promise<unknown> {
  const controller = new AbortController();
  const timeout = setTimeout(() => controller.abort(), timeoutMs);

  try {
    const result = await fn();
    return result;
  } finally {
    clearTimeout(timeout);
  }
}

Common Trap

Don't retry tool calls aggressively on the frontend. If a tool call fails, send the error back to the model and let it decide what to do. The model might choose a different tool, adjust its arguments, or gracefully inform the user. Retrying blindly can burn API credits, hit rate limits, and create a worse experience than just telling the user something went wrong.

Frontend-Side Tools: The LLM Controls Your UI

This is where tool calling gets really powerful for frontend developers. Instead of just calling backend APIs, you can define tools that control the UI itself. The model becomes an intelligent controller of your application.

const frontendTools = [
  {
    name: "scroll_to_section",
    description: "Scroll the page to a specific section",
    parameters: {
      type: "object",
      properties: {
        sectionId: { type: "string", description: "The section DOM id to scroll to" }
      },
      required: ["sectionId"]
    },
    execute: ({ sectionId }: { sectionId: string }) => {
      document.getElementById(sectionId)?.scrollIntoView({ behavior: "smooth" });
      return { success: true, scrolledTo: sectionId };
    }
  },
  {
    name: "apply_filter",
    description: "Filter the product list by category or price range",
    parameters: {
      type: "object",
      properties: {
        category: { type: "string" },
        maxPrice: { type: "number" }
      }
    },
    execute: ({ category, maxPrice }: { category?: string; maxPrice?: number }) => {
      applyFilters({ category, maxPrice });
      return { success: true, activeFilters: { category, maxPrice } };
    }
  },
  {
    name: "open_modal",
    description: "Open a dialog to show detailed information",
    parameters: {
      type: "object",
      properties: {
        modalType: { type: "string", enum: ["product_detail", "help", "settings"] },
        data: { type: "object" }
      },
      required: ["modalType"]
    },
    execute: ({ modalType, data }: { modalType: string; data?: unknown }) => {
      openModal(modalType, data);
      return { success: true, opened: modalType };
    }
  }
];

Now a user can type "Show me jackets under $100" and the model calls apply_filter with { category: "jackets", maxPrice: 100 }. The UI updates, the model sees the result, and responds "I've filtered the products to show jackets under $100. Here are 12 results."

The user interacted with natural language. The UI responded programmatically. No button was clicked.

Quiz

Why is it important to return a result from frontend-side tool executions back to the model?

ABCD

Multi-Step Agent Loops

Single tool calls are useful. Multi-step chains are where agents emerge. The model calls one tool, reads the result, decides it needs another tool, calls that, and keeps going until it has everything it needs to answer.

async function agentLoop(
  messages: Message[],
  tools: ToolDefinition[],
  maxSteps: number = 10
): Promise<Message[]> {
  let currentMessages = [...messages];
  let steps = 0;

  while (steps < maxSteps) {
    const response = await callLLM(currentMessages, tools);
    currentMessages.push(response);

    const toolCalls = extractToolCalls(response);

    if (toolCalls.length === 0) {
      break;
    }

    const toolResults = await Promise.all(
      toolCalls.map(async (tc) => {
        const result = await executeToolCall(tc.name, tc.args, tc.id);
        return result;
      })
    );

    currentMessages.push(...toolResults);
    steps++;
  }

  return currentMessages;
}

The maxSteps Guard

Always cap the number of iterations. Without a limit, a confused model can loop forever -- calling tools that return results that trigger more tool calls that return results. This burns API credits and CPU. A reasonable default is 5-10 steps for most use cases.

Parallel Tool Calls

Models can request multiple tool calls in a single turn. When the model outputs two tool_use blocks at once, execute them in parallel:

// The model might return multiple tool calls at once:
// tool_use: get_weather({ city: "Tokyo" })
// tool_use: get_weather({ city: "London" })

// Execute in parallel, not sequentially:
const results = await Promise.all(
  toolCalls.map(tc => executeToolCall(tc.name, tc.args, tc.id))
);

This is exactly like the sequential vs parallel await pattern from async/await. Independent tool calls should always run in parallel.

Visualizing Multi-Step Execution

For multi-step agents, the UI needs to show progress through each step. A vertical timeline works well:

function AgentTimeline({ steps }: { steps: AgentStep[] }) {
  return (
    <ol className="relative border-l border-[var(--color-border)]">
      {steps.map((step, i) => (
        <li key={i} className="mb-6 ml-4">
          <StepIndicator status={step.status} />
          <h4 className="text-sm font-medium">{step.toolName}</h4>
          {step.status === "executing" && <LoadingDots />}
          {step.status === "complete" && (
            <ToolResultRenderer
              toolName={step.toolName}
              result={step.result}
            />
          )}
        </li>
      ))}
    </ol>
  );
}

Each step appears as it happens, with a loading indicator for the current step. Users can see exactly what the agent is doing, building trust through transparency.

Putting It All Together: The Complete Implementation

Here's how all the pieces connect in a real chat interface:

function useToolCallingChat() {
  const [messages, setMessages] = useState<Message[]>([]);
  const [toolStates, setToolStates] = useState<Map<string, ToolCallState>>(new Map());

  async function sendMessage(content: string) {
    const userMessage: Message = { role: "user", content };
    setMessages(prev => [...prev, userMessage]);

    const stream = await streamChat([...messages, userMessage], tools);

    for await (const event of stream) {
      switch (event.type) {
        case "text-delta":
          appendToLastMessage(event.delta);
          break;

        case "tool-input-start":
          setToolStates(prev => new Map(prev).set(event.toolCallId, {
            status: "pending",
            toolName: event.toolName,
          }));
          break;

        case "tool-input-available":
          setToolStates(prev => new Map(prev).set(event.toolCallId, {
            status: "executing",
            toolName: event.toolName,
            args: event.args,
          }));

          const result = await executeToolCall(
            event.toolName,
            event.args,
            event.toolCallId
          );

          setToolStates(prev => new Map(prev).set(event.toolCallId, {
            status: "complete",
            toolName: event.toolName,
            result: result.content,
          }));
          break;
      }
    }
  }

  return { messages, toolStates, sendMessage };
}

Quiz

In a streaming tool call, when is it safe to execute the tool function?

ABCD

Common Mistakes

What developers do	What they should do
Executing tool calls without validating the arguments from the LLM LLMs are probabilistic. They might output { city: 123 } instead of { city: 'Tokyo' }. Without validation, you pass garbage to your API and get cryptic errors that are hard to debug.	Always validate tool arguments with Zod or JSON Schema before executing. The model can produce malformed JSON, missing required fields, or hallucinated values.
Trying to parse tool arguments from partial streaming chunks Partial JSON like '{"city":"Tok' is not valid JSON. JSON.parse will throw. Wait for the complete input before parsing and executing.	Accumulate all argument chunks into a buffer and parse only when the input-complete signal arrives.
Hiding tool calls from users entirely When users see the model 'thinking' for 5 seconds with no indication of what's happening, they lose trust. Showing 'Searching weather data...' with a loading indicator builds transparency. The best AI products (ChatGPT, Claude) show tool calls prominently.	Show tool execution as collapsible inline cards with tool name, status, and result. Let users expand for details.
Running multi-step agent loops without a maximum step limit A confused model can call tools in an infinite loop, burning API credits and hanging the UI. Without a cap, a single user message could trigger dozens of API calls. Always set a ceiling.	Always cap iterations with a maxSteps parameter (typically 5-10). Break the loop when no more tool calls are returned.
Retrying failed tool calls automatically without telling the model The model can make intelligent recovery decisions -- adjust arguments, try a different tool, or gracefully explain the failure. Silent retries waste resources and bypass the model's ability to adapt.	Send the error back to the model as a tool_result with is_error: true. Let the model decide whether to retry, use a fallback tool, or inform the user.

Key Rules

1The LLM never executes tools. It outputs structured JSON (tool name + arguments). Your code executes and sends the result back.
2Always validate tool arguments before execution. LLMs are probabilistic and can produce malformed or hallucinated parameter values.
3Model tool call states as a discriminated union (pending, streaming_args, executing, complete, error) -- not boolean flags.
4During streaming, accumulate argument chunks and only parse when the complete signal fires. Partial JSON is not parseable.
5Always cap agent loops with maxSteps. A model can call tools indefinitely without a guard, burning credits and hanging the UI.
6Send tool errors back to the model instead of silently retrying. The model can make intelligent recovery decisions.
7Frontend-side tools let the LLM control UI actions (scroll, filter, navigate). Always return execution results so the model knows what happened.