RSS

Build a Basic AI Agent in TypeScript, Part 3: Multi-Step Skill Orchestration

January 30, 2026

Overview

Part 1 established the core loop, and Part 2 added model-driven tool calling. Part 3 focuses on the next mental model shift: from one tool at a time to coordinated skills with clear execution boundaries. The goal is not to build a full production platform. It is to understand the core concepts you need before scaling complexity.

TLDR: Tools execute one action. Skills coordinate related actions. A useful agent needs simple routing plus lightweight patterns for safe, observable, and controllable execution.

Full Series

Sections

What Part 3 Adds Beyond Part 2

Part 2 solved one key problem: how to let the model request a tool and return results in a loop.

Part 3 solves a different problem: how to organize many tool-capable behaviors without turning runtime logic into spaghetti.

In practical terms:

  • Part 2 answer: “Can the model call a tool?”
  • Part 3 answer: “Can the runtime choose and coordinate the right skill safely?”

Core Concept 1, Skills vs Tools

A tool is a single deterministic capability, for example get_weather(location).

A skill is a higher-level unit of behavior that can:

  • decide which tool(s) to use,
  • control sequence across steps,
  • normalize outputs for the agent response.

Think of it this way:

Tool  = one action adapter
Skill = a small workflow that may use one or more tools

This distinction matters because users ask for outcomes, not function calls.

Claude Skills Example, How Skills Plug Into the Loop

If you want a concrete implementation model, Claude Skills are a useful reference because the pattern is explicit and easy to reason about.

High-level shape:

  • Skills live on disk as SKILL.md files.
  • The SDK discovers them from configured locations.
  • The model invokes them when descriptions match user intent.
  • Your runtime still controls tool access and loop boundaries.

What a skill contains:

my-skill/
├── SKILL.md        (required)
├── scripts/        (optional, executable helpers)
├── references/     (optional, docs loaded as needed)
└── assets/         (optional, templates, fonts, icons)

Two ideas matter most for agent loops: multi-step execution and discovery.

Progressive Discovery

Skills are designed so the model can discover the right behavior without loading everything every turn:

  1. Frontmatter in SKILL.md supports discovery and routing.
  2. The SKILL.md body is loaded when the skill is relevant.
  3. Linked files are opened only when needed.

This keeps context focused while still supporting deeper workflows.

Why This Helps Multi-Step Flows

For multi-step requests, skills encode repeatable workflow guidance, not only single tool calls.

  • Connectors and tools define what the runtime can do.
  • Skills define how the runtime should do it across steps.
  • Multiple skills can compose in the same session when tasks overlap.

This is why skills improve consistency. The model is not re-inventing the process from scratch every prompt.

Minimal SKILL.md example:

---
name: order-support
description: Handle order lookup, shipping-change eligibility checks, and guided next steps.
---

# Order Support Skill

## When to Use
- User asks about order status
- User asks to change shipping after checkout

## Instructions
1. Extract order id from user text.
2. Check order status.
3. Check change-window policy.
4. Ask for confirmation before submitting change requests.
5. Return a concise final response.

Then wire it into a custom agent loop in TypeScript:

// Startup phase, discover skills once and keep a lightweight index.
const skillIndex = loadSkillsFromFilesystem({
  cwd: "/path/to/project",
  settingSources: ["project", "user"]
});

async function runAgentLoop(): Promise<void> {
  while (true) {
    const userText = await readUserInput();
    if (!userText || userText.trim() === "") break;

    // Build context with conversation history, tool definitions, and skill metadata.
    const plan = await llmPlan({
      messages: conversation,
      tools: toolDefinitions,
      skillMetadata: skillIndex.metadata
    });

    // Progressive discovery, load the full skill only when selected.
    const activeSkill =
      plan.type === "use_skill"
        ? skillIndex.loadFullSkill(plan.skillName)
        : null;

    // Execute tool calls with runtime controls.
    const result = await executePlan({
      plan,
      activeSkill,
      toolRegistry,
      maxSteps: 6,
      requireConfirmationFor: ["submit_change_request"]
    });

    conversation.push({ role: "assistant", content: result.responseText });
    render(result.responseText);
  }
}

This fits the same architecture we built in this series. The loop remains simple. Skills help the model discover and compose multi-step behavior, while your runtime still controls actual tool execution and safety policy.

Note: In Claude SDK usage, include settingSources and allow the Skill tool so skill discovery and invocation can happen from filesystem locations.

Core Concept 2, Simple Skill Routing

Routing is the decision of which skill handles a request.

In a minimal architecture, routing can stay very simple:

  1. The model classifies intent.
  2. The runtime maps intent to one skill.
  3. The skill executes and returns a structured result.

When two skills overlap, use explicit precedence rules. For example, “account safety” can outrank “general account info”. This keeps behavior predictable even when user prompts are ambiguous.

Core Concept 3, Basic Multi-Step Orchestration

Orchestration means coordinating multiple internal steps to complete one user request.

Example flow:

User asks: "Can you check order 4821 and tell me if I can still change shipping?"
    -> Route to OrderSupportSkill
    -> Step 1: fetch order status
    -> Step 2: evaluate policy rules
    -> Step 3: return user-facing answer with next action

This is the conceptual jump from Part 2. The runtime is no longer just “tool call in, tool result out”. It now executes a small workflow with clear boundaries.

Core Concept 4, Safe Observable Controllable Execution

This is the most important concept in this part.

Safe

Safe execution means the agent does not perform unintended or duplicated actions.

Lightweight patterns:

  • validate required inputs before side-effecting actions,
  • use idempotency keys for action-like operations,
  • require explicit confirmation for high-impact actions.

Observable

Observable execution means operators can answer, “What happened, where, and why?”

Minimal per-request fields:

  • request id,
  • selected skill,
  • tool calls made,
  • latency,
  • final status.

Controllable

Controllable execution means runtime policy stays in code, not hidden in model output.

Simple controls:

  • max skill steps per request,
  • allowlist of executable tools per skill,
  • clear fallback path when routing is uncertain.

You can implement these controls without building a large framework. Even minimal guardrails dramatically improve reliability and debuggability.

TypeScript Pseudocode, Multi-Step Skills in the Agent Loop

Note: This is simplified pseudocode to illustrate the concept, not actual implementation.

type Route = "weather" | "order_support" | "fallback";

const toolRegistry = {
  lookup_order: async (args: { orderId: string }) => {
    // Calls your real TypeScript tool function.
    return { ok: true, data: { id: args.orderId, status: "PROCESSING", placedAt: "2026-01-21T10:11:12Z" } };
  },
  check_change_window: async (args: { orderStatus: string; placedAt: string }) => {
    // Encapsulates policy lookup logic.
    return { ok: args.orderStatus === "PROCESSING", data: { isEligible: true } };
  },
  submit_change_request: async (args: {
    orderId: string;
    newAddress: string;
    idempotencyKey: string;
  }) => {
    // Side-effecting operation guarded by idempotency key.
    return { ok: true, data: { ticketId: "chg_12345" } };
  }
};

async function runOrderSupportSkill(input: {
  requestId: string;
  userText: string;
  userConfirmed: boolean;
  requestedAddress: string;
}): Promise<{ status: "success" | "failed"; responseText: string }> {
  const state: Record<string, unknown> = {
    userText: input.userText,
    userConfirmed: input.userConfirmed,
    requestedAddress: input.requestedAddress
  };

  // Step 1: Extract order id (could come from model extraction + validation).
  state.orderId = "4821";

  // Step 2: Call lookup tool.
  const order = await toolRegistry.lookup_order({
    orderId: String(state.orderId)
  });
  if (!order.ok) {
    return { status: "failed", responseText: "Order lookup failed." };
  }
  state.order = order.data;

  // Step 3: Call policy tool.
  const policy = await toolRegistry.check_change_window({
    orderStatus: String((state.order as { status: string }).status),
    placedAt: String((state.order as { placedAt: string }).placedAt)
  });
  if (!policy.ok) {
    return { status: "failed", responseText: "Policy check failed." };
  }
  state.isEligible = policy.data?.isEligible === true;

  // Step 4: Optionally call side-effecting tool.
  if (state.isEligible === true && state.userConfirmed === true) {
    const submit = await toolRegistry.submit_change_request({
      orderId: String(state.orderId),
      newAddress: String(state.requestedAddress),
      idempotencyKey: `${input.requestId}:shipping-change`
    });
    if (!submit.ok) {
      return { status: "failed", responseText: "Change request failed." };
    }
    state.changeTicket = submit.data?.ticketId;
  }

  // Step 5: Compose final response.
  const responseText =
    state.isEligible === true
      ? state.userConfirmed === true
        ? `Shipping change submitted. Ticket: ${state.changeTicket}`
        : "Your order is eligible for shipping changes. Confirm to continue."
      : "This order can no longer be changed under current policy.";

  return { status: "success", responseText };
}

class AgentRuntime {
  async handleRequest(input: {
    requestId: string;
    userText: string;
    userConfirmed: boolean;
    requestedAddress: string;
  }): Promise<string> {
    const route = await this.routeIntent(input.userText);
    const startedAt = Date.now();

    const result =
      route === "order_support"
        ? await runOrderSupportSkill(input)
        : { status: "success", responseText: "Fallback response." as const };

    this.logExecution({
      requestId: input.requestId,
      route,
      latencyMs: Date.now() - startedAt,
      status: result.status
    });

    return result.responseText;
  }

  private async routeIntent(_userText: string): Promise<Route> {
    // Model-assisted route selection.
    return "order_support";
  }

  private logExecution(event: {
    requestId: string;
    route: Route;
    latencyMs: number;
    status: "success" | "failed";
  }): void {
    // Structured logs for observability.
  }
}

The key learning is that one skill can run an ordered sequence of steps, and each step can decide whether to call zero, one, or many tools. The agent loop still stays simple: route request, run skill, log outcome, and return response.

The important part is not the syntax. The important part is separation of concerns:

  • route selection,
  • multi-step skill execution,
  • structured execution logging.

Common Pitfalls

Treating every behavior as a tool

This creates large, hard-to-manage tool surfaces. Group related behavior into skills instead.

Hidden routing rules

If routing logic is not explicit, behavior drifts and becomes hard to debug. Keep conflict handling and fallback rules visible.

No execution record

Without basic logs, misroutes and failures are guesswork. Always emit minimal structured execution events.

Lessons Learned

  • Part 1 gave the loop, Part 2 gave tool calling, Part 3 gives runtime structure.
  • Skills are the right abstraction for multi-step outcomes.
  • A small set of guardrails can make agent behavior much easier to trust and operate.
  • Safe, observable, controllable execution is a design choice, not an optional add-on.

Series Wrap-Up

You now have a complete progression:

  1. Core loop and state management.
  2. Tool calling and model-driven routing.
  3. Skill orchestration and execution guardrails.

This is a practical baseline for building agent systems that stay understandable as they grow.

If you want to continue the series with production operations depth, Part 4 can focus on deployment guardrails, human approval design, incident readiness, and long-term quality evaluation.

Official References