Overview
At its core, an agent is a control loop wrapped around model calls and state management. This post introduces that loop first, then shows a runnable TypeScript implementation that matches the same flow. The aim is to keep the mental model simple while still giving readers something practical to execute.
TLDR: A basic agent is mostly a deterministic loop that sends conversation history to a model, appends responses, and repeats until an exit condition is met.
Full Series
- Part 1, The Core Loop (you are here)
- Part 2, Tool Calling and LLM Intent Routing
- Part 3, Multi-Step Skill Orchestration
- Part 4, Production Readiness and Operational Guardrails
Sections
- What We Are Building in Part 1
- Core Concepts and Keywords
- Architecture at a Glance
- Turn Lifecycle
- Clarifying Two Terms
- TypeScript Implementation
- Why This Design Matters
- Common Pitfalls in Basic Agent Loops
- Lessons Learned
- What Comes Next in Part 2
- Official References
What We Are Building in Part 1
In this part, we are intentionally building the smallest useful baseline:
- A terminal-style chat loop.
- Persistent conversation memory across turns.
- A model call wrapper.
- Basic error handling and exit conditions.
We are not adding tools or skills yet. That is intentional. If the loop is weak, everything built on top will be fragile.
Core Concepts and Keywords
Before code, here are the key terms used throughout the series.
Agent
An agent is a program that repeatedly observes input, decides what to do next, and produces output. In practical LLM apps, the “decide” step often means making a model call and interpreting the response.
Turn
A turn is one cycle of interaction:
- User message enters the system.
- Model generates a response.
- Response is shown and stored.
Conversation State
Conversation state is the full message history sent back to the model on each turn. Most model APIs are stateless by request, so memory lives in the application, not on the model server.
Inference
Inference is one model generation call, for example, “Given these messages, produce the next assistant message.”
Deterministic Control Loop
A deterministic control loop means runtime flow is explicit and predictable. The model generates text, but the program decides:
- when to call the model,
- what context to send,
- when to stop,
- how to handle failures.
Architecture at a Glance
┌────────────────────────────────────────────────────────┐
│ Basic Agent Runtime │
├────────────────────────────────────────────────────────┤
│ while (turnCount < maxTurns) │
│ │ │
│ ▼ │
│ Read user input │
│ │ │
│ ├── empty input ──▶ Exit │
│ │ │
│ ▼ │
│ Append user message to conversation[] │
│ │ │
│ ▼ │
│ Model call (inference) with full conversation[] │
│ │ │
│ ▼ │
│ Append assistant message and display response │
│ │ │
│ ▼ │
│ Next loop iteration │
└────────────────────────────────────────────────────────┘
This diagram shows system structure and control flow. It answers “what components exist and how data moves through the while loop”.
Turn Lifecycle
Single iteration (Turn N) inside the while loop
↓
Read user input
↓
Append user message to conversation state
↓
Call model with full conversation
↓
Append assistant response to conversation state
↓
Render response
↓
Loop condition checked again
↓
Turn N+1 begins if condition still holds
This diagram is intentionally narrower. It zooms into one turn only and ignores broader component boundaries.
Clarifying Two Terms
Model Call (Inference)
“Model call” and “inference” refer to the same step. The runtime sends current conversation history to the model API and receives the assistant’s next message.
Append + Display
- Append the assistant response to
conversation[]so memory is preserved. - Display the response in the terminal, for example with
console.log.
TypeScript Implementation
This section uses a hybrid approach:
- Start with a short conceptual loop.
- Follow with a runnable implementation.
- Call out runtime details that are useful, but secondary to the core concept.
Conceptual Loop (Read First)
This snippet is intentionally small and focused on loop semantics.
Note: This is simplified pseudocode to illustrate the concept, not actual implementation.
class Agent {
private conversation: Message[] = [];
async run(): Promise<void> {
let turnCount = 0;
while (turnCount < this.config.maxTurns) {
const input = await this.getUserInput();
if (!input || input.trim() === "") break;
this.conversation.push({
role: "user",
content: input,
timestamp: Date.now(),
});
const assistantText =
await this.runInference(
this.conversation,
);
this.conversation.push({
role: "assistant",
content: assistantText,
timestamp: Date.now(),
});
this.render(assistantText);
turnCount += 1;
}
}
}
Runnable Example (Copy and Run)
Local Setup
The following commands create a minimal runnable TypeScript project.
mkdir basic-agent-loop
cd basic-agent-loop
bun init -y
Create .env and add the API key. Bun automatically loads .env files at runtime, see Bun environment variables.
OPENROUTER_API_KEY=your_api_key_here
src/main.ts
This is a minimal working loop that can be copied and run.
import { createInterface } from "node:readline/promises";
import {
stdin as input,
stdout as output,
} from "node:process";
type Role = "user" | "assistant";
interface Message {
role: Role;
content: string;
timestamp: number;
}
interface ModelClient {
createMessage(input: {
model: string;
messages: Array<{
role: Role;
content: string;
}>;
maxTokens: number;
}): Promise<{ text: string }>;
}
interface AgentConfig {
model: string;
maxTokens: number;
maxTurns: number;
}
class OpenRouterModelClient
implements ModelClient
{
constructor(
private readonly apiKey: string,
) {}
async createMessage(input: {
model: string;
messages: Array<{
role: Role;
content: string;
}>;
maxTokens: number;
}): Promise<{ text: string }> {
const response = await fetch(
"https://openrouter.ai/api/v1/chat/completions",
{
method: "POST",
headers: {
Authorization: `Bearer ${this.apiKey}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: input.model,
messages: input.messages,
max_tokens: input.maxTokens,
}),
},
);
if (!response.ok) {
const errorBody = await response.text();
throw new Error(
`OpenRouter error ${response.status}: ${errorBody}`,
);
}
const data = (await response.json()) as {
choices?: Array<{
message?: {
content?:
| string
| Array<{
type?: string;
text?: string;
}>;
};
}>;
};
const content = data.choices?.[0]?.message?.content;
const text =
typeof content === "string"
? content.trim()
: Array.isArray(content)
? content
.map((item) =>
item.type === "text"
? item.text || ""
: "",
)
.join("\n")
.trim()
: "";
return { text: text || "[No text response]" };
}
}
class Agent {
private conversation: Message[] = [];
constructor(
private readonly modelClient: ModelClient,
private readonly config: AgentConfig,
private readonly getUserInput: () => Promise<
string | null
>,
private readonly render: (
text: string,
) => void,
) {}
async run(): Promise<void> {
let turnCount = 0;
this.render(
"Chat started. Submit empty input to exit.",
);
while (turnCount < this.config.maxTurns) {
const userInput = await this.getUserInput();
if (!userInput || userInput.trim() === "") {
this.render("Session ended by input.");
break;
}
this.appendUserMessage(userInput);
try {
const assistantText =
await this.runInference();
this.appendAssistantMessage(
assistantText,
);
this.render(assistantText);
} catch (error) {
const message =
error instanceof Error
? error.message
: "Unknown inference error";
this.render(
`Inference failed: ${message}`,
);
}
turnCount += 1;
}
if (turnCount >= this.config.maxTurns) {
this.render(
"Session ended at max turn limit.",
);
}
}
private appendUserMessage(
content: string,
): void {
this.conversation.push({
role: "user",
content,
timestamp: Date.now(),
});
}
private appendAssistantMessage(
content: string,
): void {
this.conversation.push({
role: "assistant",
content,
timestamp: Date.now(),
});
}
private async runInference(): Promise<string> {
const response =
await this.modelClient.createMessage({
model: this.config.model,
maxTokens: this.config.maxTokens,
messages: this.conversation.map(
(message) => ({
role: message.role,
content: message.content,
}),
),
});
return response.text;
}
}
async function main(): Promise<void> {
const apiKey = process.env.OPENROUTER_API_KEY;
if (!apiKey) {
throw new Error(
"Missing OPENROUTER_API_KEY in environment",
);
}
const modelClient = new OpenRouterModelClient(
apiKey,
);
const readline = createInterface({
input,
output,
});
const getUserInput = async (): Promise<
string | null
> => {
try {
return await readline.question("You: ");
} catch {
return null;
}
};
const render = (text: string): void => {
console.log(`Assistant: ${text}`);
};
const agent = new Agent(
modelClient,
{
model: "openai/gpt-4o-mini",
maxTokens: 1024,
maxTurns: 20,
},
getUserInput,
render,
);
try {
await agent.run();
} finally {
readline.close();
}
}
main().catch((error) => {
const message =
error instanceof Error
? error.message
: "Unknown startup error";
console.error(`Fatal error: ${message}`);
process.exit(1);
});
Run it:
bun src/main.ts
Example conversation from a real run:
Assistant: Chat started. Submit empty input to exit.
You: hello world
Assistant: Hello! How can I assist you today?
You: what is the weather in london
Assistant: I don't have real-time data access to provide current weather information. However, you can easily check the weather in London by using a weather website, app, or a voice-activated assistant. If you're looking for typical weather conditions in London or historical data, I can help with that!
This result is expected in Part 1. The loop is working, but there are no tools yet, so the model cannot fetch live weather. Part 2 adds a custom tool so the agent can call external systems for real-time data.
What to Ignore for Now
To stay focused on the core idea, treat these as implementation details in Part 1:
- API client wiring (
fetchsetup and headers). - Environment loading from
.env(handled by Bun runtime). - Terminal wiring (
readline).
The only required mental model in this part is still the same:
- Read input.
- Append to conversation state.
- Call model with full state.
- Append assistant response.
- Display response.
- Repeat until exit condition.
Why This Design Matters
1. Testability
Injected input, output, and model dependencies make Agent.run() easy to test with fakes, which means regressions in loop behavior are caught before deployment. In practice, this lowers the risk of breaking basic interaction flow when adding tools, retries, or routing logic later. Mock Service Worker is a practical choice for mocking LLM API responses.
2. Replaceable Model Providers
Provider portability protects the architecture from API churn, pricing changes, and model quality shifts. With ModelClient as an interface, the core loop remains stable while provider-specific code stays isolated in one adapter. In production systems, this isolation reduces migration cost and makes multi-provider fallback strategies realistic. It also makes it easier to switch providers or models based on use case.
3. Explicit Failure Handling
Explicit per-turn error handling improves runtime resilience. A single failed model call can be surfaced to the user without killing the process, which preserves session continuity and helps operators debug failure patterns. This becomes especially important when rate limits, transient network issues, and provider timeouts appear under load.
4. Operational Guardrails
A max-turn cap and clear exit paths are basic but important guardrails. They prevent unbounded loops, runaway token spend, and confusing terminal behavior during incidents. Production hardening starts with these deterministic limits, then expands to rate limits, retries with backoff, and structured logging.
Common Pitfalls in Basic Agent Loops
Pitfall 1: Forgetting Full Context
If only the latest user message is sent each time, the assistant appears to forget prior turns. Beyond poor UX, this also breaks task continuity and can cause inconsistent outputs in multi-step workflows. Stateful conversation replay is what makes the system behave like a coherent session instead of unrelated single prompts.
Pitfall 2: Mixing Business Logic and I/O
When stdin, rendering, and model calls are tightly coupled, testing and debugging become painful. Tight coupling also blocks reuse, for example, reusing the same loop in CLI, API, and background worker contexts. Separating concerns early keeps future integrations low-risk.
Pitfall 3: No Exit Strategy
Without turn limits and stop conditions, loops can run indefinitely. In production this translates directly into cost leaks and hard-to-diagnose runtime behavior. Deterministic stop rules are required for operational predictability.
Pitfall 4: Silent Failure Paths
If model errors are swallowed, the interface appears unresponsive with no clear reason. Silent failures are expensive during incident response because there is no signal for users or operators. Even minimal loops should return visible errors and leave room for structured logs later.
Lessons Learned
- The core loop is the foundation, behavior comes from control flow and state before tools are introduced.
- Stateless model APIs require explicit conversation persistence in application code.
- Clean boundaries between loop logic, model adapter, and I/O make future hardening much easier.
- Production readiness starts with simple controls, explicit errors, deterministic exits, and replaceable integrations.
What Comes Next in Part 2
In Part 2, the implementation adds a custom tool and the request and response handshake that lets the model ask the runtime to perform real actions. That is the step where the agent starts affecting the outside world.
Keeping this baseline code clean makes the next layers, tools and skills, easier to reason about and maintain.