AI Engineering · Topic 6 of 8

AI Agents & Tool Calling

200 XP

What is an agent?

An agent is a system where an LLM drives a perception → reasoning → action loop that runs multiple steps until a goal is reached or a stopping condition is met. The key distinction from a single LLM call:

  • LLM call: question in, answer out. One round trip.
  • Agent: the model decides what to do next, executes it (via tools), observes the result, and continues — potentially for tens of steps.

The model is the reasoning engine. Tools are the actuators. Memory determines what context the model has at each step.

Tool calling mechanics

When you define tools, you pass their JSON Schema descriptions alongside the messages. The LLM does not execute tools — it decides to call them by returning a structured tool_calls response instead of a content string:

const tools = [{
  type: "function",
  function: {
    name: "search_web",
    description: "Search the web for current information",
    parameters: {
      type: "object",
      properties: {
        query: { type: "string", description: "The search query" }
      },
      required: ["query"]
    }
  }
}]

const response = await openai.chat.completions.create({ model: "gpt-4o", messages, tools })

if (response.choices[0].finish_reason === "tool_calls") {
  const toolCall = response.choices[0].message.tool_calls![0]
  const args = JSON.parse(toolCall.function.arguments)
  const result = await searchWeb(args.query)  // your code executes the tool

  // feed result back
  messages.push(response.choices[0].message)  // model's tool_call message
  messages.push({ role: "tool", tool_call_id: toolCall.id, content: JSON.stringify(result) })
  // call the model again — it now reasons over the tool result
}

The critical insight: the tool execution loop lives in your application code, not inside the model. You control retries, error handling, and when to stop.

The ReAct pattern

ReAct (Reason + Act) is the foundational prompting pattern for agents. At each step the model explicitly:

  1. Thought: Reason about the current state and what to do next
  2. Action: Invoke a tool
  3. Observation: Receive the tool result
  4. Repeat until goal is reached, then output a final answer
Thought: I need to find the current stock price of AAPL.
Action: search_web(query="AAPL stock price today")
Observation: AAPL is trading at $213.50 as of 2025-05-14.
Thought: I have the price. I can answer the user now.
Final Answer: Apple (AAPL) is currently trading at $213.50.

Modern LLMs with native tool calling implement ReAct implicitly — the tool_calls response is the Action, and the tool role message is the Observation.

Multi-agent orchestration

Single agents struggle with tasks that require deep specialisation across multiple domains. Multi-agent systems decompose work:

Orchestrator + specialist pattern:

  • Orchestrator receives the top-level goal, plans the subtasks, and delegates to specialists
  • Specialist agents have focused tool sets (web search agent, code execution agent, database query agent)
  • Results flow back to the orchestrator for synthesis
User: "Research competitors and draft a market analysis report"

Orchestrator
  ├── Search Agent: find 5 competitors, their pricing, recent news
  ├── Data Agent: pull our own metrics from internal DB
  └── Writer Agent: draft report given all collected context

Frameworks: LangGraph (graph-based state machines), CrewAI (role-based), AutoGen (conversation-based multi-agent). The patterns matter more than the framework — learn the pattern, adapt to whatever your team uses.

Memory types

Memory typeScopeImplementationCost
In-contextCurrent conversationMessages arrayToken cost scales linearly
External vectorLong-term factsVector DB retrieval~50ms per retrieval
Structured (SQL)Structured stateDB read/writeFast, queryable
EpisodicPast conversation summariesLLM-summarised + storedWrite cost once, cheap to read

Long-running agents hit context window limits. Mitigation strategies: summarise and compress old turns, move facts to external memory, use sliding window with only recent N turns.

Failure modes in production

Infinite loops — the model keeps calling tools without reaching a stopping condition. Mitigate with hard step limits (e.g., max 15 iterations) and a “stuck” detector (same tool called with same args twice).

Hallucinated tool calls — the model invokes a tool with syntactically valid but semantically wrong arguments (e.g., a made-up customer ID). Validate all tool inputs against real data before executing. Never pass unvalidated model output directly to databases.

Context exhaustion — each tool result appended to messages grows the context. At step 10 with verbose tool outputs, you may exceed the context window. Truncate or summarise tool results before adding to context.

Cascading tool errors — a tool failure mid-chain may cause downstream tools to operate on missing data, silently producing wrong answers. Implement explicit error handling in the tool result and instruct the model how to recover.

Interview angle

“Design an AI agent that can answer questions about our internal codebase.” The answer should cover: tool set (file search, code execution, grep), retrieval strategy (embed codebase chunks in vector DB), memory management (how to handle long conversations), guardrails (what tools the agent must NOT call — e.g., production DB writes), observability (log every tool call and result for debugging), and the step limit / human escalation path.