Skip to content
Learning Lab · 5 min read

Building AI Agents: Architecture Patterns, Tool Calling, and Memory Management

Learn how to build production-ready AI agents by mastering tool calling contracts, structuring agent loops correctly, and separating memory into session, knowledge, and execution layers. Includes working Python code examples.

AI Agent Architecture: Tool Calling and Memory Patterns

You built a chatbot that answers questions. Now you need it to do something — fetch data, call an API, update a database. The difference between a chatbot and an agent is a single constraint: agents take actions based on what they learn.

Most attempts fail because developers treat tool calling as an add-on, not as the core of the system. They call an LLM, wait for a response, pass tools as an afterthought. Production agents need a different architecture — one that treats the LLM as a decision engine, not a text generator.

Tool Calling: The Contract, Not the Feature

Tool calling isn’t about giving an LLM access to functions. It’s about defining a contract the LLM must follow.

When you define a tool, you’re not giving the model a black box. You’re specifying:

  • What the tool does (description)
  • What parameters it requires (schema)
  • What format it returns (output specification)

Most tool calling fails because descriptions are vague. “Fetch user data” loses you immediately — fetch which data? What does the function signature look like? What if the user doesn’t exist?

Here’s what a bad tool definition looks like:

{
  "name": "get_user",
  "description": "Get user information",
  "parameters": {
    "type": "object",
    "properties": {
      "user_id": {
        "type": "string"
      }
    }
  }
}

The LLM doesn’t know what happens when user_id is invalid. It doesn’t know if user_id should be a UUID or an integer. It doesn’t know what fields the response contains.

Here’s the improved version:

{
  "name": "get_user_profile",
  "description": "Retrieve a user's profile by ID. Returns basic account info including name, email, creation date, and account status. Returns null if user not found.",
  "parameters": {
    "type": "object",
    "properties": {
      "user_id": {
        "type": "string",
        "description": "UUID of the user. Format: 550e8400-e29b-41d4-a716-446655440000"
      }
    },
    "required": ["user_id"]
  },
  "returns": {
    "type": "object",
    "properties": {
      "id": {"type": "string"},
      "name": {"type": "string"},
      "email": {"type": "string"},
      "status": {"type": "string", "enum": ["active", "suspended", "deleted"]},
      "created_at": {"type": "string"}
    }
  }
}

Claude Sonnet 4 (January 2025 release) improved tool calling consistency by 34% compared to earlier versions when schemas are specific. Vague definitions still confuse it — that’s not a model limitation, that’s a design failure.

The Loop: Making Decisions Sequential

An agent loop is simple in structure but broken in almost every first implementation.

The basic flow: LLM decides → tool executes → result returns → LLM decides again → repeat until done.

Here’s a working Python example using Claude:

import anthropic
import json

client = anthropic.Anthropic()
tools = [
    {
        "name": "fetch_order",
        "description": "Get order details by order ID",
        "input_schema": {
            "type": "object",
            "properties": {
                "order_id": {
                    "type": "string",
                    "description": "Unique order identifier"
                }
            },
            "required": ["order_id"]
        }
    },
    {
        "name": "update_order_status",
        "description": "Update an order's status",
        "input_schema": {
            "type": "object",
            "properties": {
                "order_id": {"type": "string"},
                "status": {
                    "type": "string",
                    "enum": ["pending", "shipped", "delivered"]
                }
            },
            "required": ["order_id", "status"]
        }
    }
]

messages = [{"role": "user", "content": "Check order ABC123 and mark it as shipped"}]

while True:
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1024,
        tools=tools,
        messages=messages
    )
    
    if response.stop_reason == "tool_use":
        # LLM wants to use a tool
        tool_calls = [block for block in response.content if block.type == "tool_use"]
        
        messages.append({"role": "assistant", "content": response.content})
        
        tool_results = []
        for tool_call in tool_calls:
            # Execute tool (stubbed here)
            if tool_call.name == "fetch_order":
                result = {"id": "ABC123", "status": "pending", "total": 99.99}
            elif tool_call.name == "update_order_status":
                result = {"success": True, "new_status": "shipped"}
            
            tool_results.append({
                "type": "tool_result",
                "tool_use_id": tool_call.id,
                "content": json.dumps(result)
            })
        
        messages.append({"role": "user", "content": tool_results})
    else:
        # LLM reached end_turn or max_tokens
        final_response = next(
            (block.text for block in response.content if hasattr(block, "text")),
            None
        )
        print(final_response)
        break

The critical mistake most developers make: they treat tool results as unstructured text. If a tool returns JSON, parse it and make the structure explicit to the LLM. Don’t force it to parse messy strings.

Memory: What Agents Actually Need to Remember

Memory is not conversation history. That’s the first thing to unlearn.

An agent needs three types of memory:

  • Session memory: What happened in this conversation — user goals, context from previous turns. This is short-term and conversation-specific.
  • Knowledge memory: Facts about the user, domain, or system state that persist across conversations. This is long-term and shared.
  • Execution memory: What the agent has already tried, what failed, what succeeded. This prevents loops and repeated errors.

Most systems conflate all three into a message history. That kills performance.

Session memory should live in the message array — but summarized, not raw. After 20 turns, compress earlier context into a single system message instead of keeping all 20 turns in context.

Knowledge memory should be separate — a vector database (Pinecone, Weaviate) or a structured key-value store. When you need user context, fetch it explicitly with a tool call, don’t stuff it into the initial prompt.

Execution memory should be an explicit log. Before the agent tries a tool, check if it’s already attempted that tool in this session. If it failed last time, pass that failure to the LLM as context.

Example structure:

{
  "session_id": "conv_12345",
  "user_goal": "Update billing address and confirm new payment method",
  "session_context": "User has active subscription. Previously tried to update payment in December but process failed.",
  "execution_log": [
    {"tool": "fetch_user_profile", "status": "success", "timestamp": "2025-01-15T10:22:00Z"},
    {"tool": "validate_address", "status": "failed", "error": "Postal code invalid", "timestamp": "2025-01-15T10:22:15Z"}
  ],
  "knowledge_refs": ["user_payment_history", "subscription_terms"],
  "messages": [
    {"role": "user", "content": "Update my address..."},
    {"role": "assistant", "content": "I'll help with that..."}
  ]
}

Do This Today

Pick one tool your agent needs to call. Write out the schema with a 3-sentence description, list every parameter with its format and constraints, and define the exact shape of the response. Test it by hand — feed the schema to Claude or GPT-4o and ask it to call the tool. If it calls it wrong, your schema is incomplete.

Batikan
· 5 min read
Share

Stay ahead of the AI curve

Weekly digest of the most impactful AI breakthroughs, tools, and strategies.

Related Articles

Context Window Management: Processing Long Docs Without Losing Data
Learning Lab

Context Window Management: Processing Long Docs Without Losing Data

Context window limits break production AI systems. Learn three concrete techniques to handle long documents and conversations without losing data or burning API costs.

· 3 min read
Connect LLMs to Your Tools: A Workflow Automation Setup
Learning Lab

Connect LLMs to Your Tools: A Workflow Automation Setup

Connect ChatGPT, Claude, and Gemini to Slack, Notion, and Sheets through APIs and automation platforms. Learn the trade-offs between models, build a working Slack bot, and automate your first workflow today.

· 5 min read
Zero-Shot vs Few-Shot vs Chain-of-Thought: Pick the Right Technique
Learning Lab

Zero-Shot vs Few-Shot vs Chain-of-Thought: Pick the Right Technique

Zero-shot, few-shot, and chain-of-thought are three distinct prompting techniques with different accuracy, latency, and cost profiles. Learn when to use each, how to combine them, and how to measure which approach works best for your specific task.

· 15 min read
10 ChatGPT Workflows That Actually Save Time in Business
Learning Lab

10 ChatGPT Workflows That Actually Save Time in Business

ChatGPT saves hours when you give it structure and clear constraints. Here are 10 production workflows — from email drafting to competitive analysis — that cut repetitive work in half, with working prompts you can use today.

· 6 min read
Stop Generic Prompting: Model-Specific Techniques That Actually Work
Learning Lab

Stop Generic Prompting: Model-Specific Techniques That Actually Work

Claude, GPT-4o, and Gemini respond differently to the same prompt. Learn model-specific techniques that exploit each one's strengths—with working examples you can use today.

· 2 min read
Write Like a Human: AI Content Without the Robot Voice
Learning Lab

Write Like a Human: AI Content Without the Robot Voice

AI-generated content defaults to averaging—safe, professional, and indistinguishable. Learn four techniques to inject real voice into your outputs: specificity constraints, pattern matching from your own writing, temperature tuning, and the constraint-audit pass that removes robotic patterns.

· 5 min read

More from Prompt & Learn

Figma AI vs Canva AI vs Adobe Firefly: Design Tools Compared
AI Tools Directory

Figma AI vs Canva AI vs Adobe Firefly: Design Tools Compared

Figma AI, Canva AI, and Adobe Firefly take different approaches to generative design. Figma prioritizes seamless integration; Canva prioritizes speed; Firefly prioritizes output quality. Here's which tool fits your actual workflow.

· 4 min read
DeepL Adds Voice Translation. Here’s What Changes for Teams
AI Tools Directory

DeepL Adds Voice Translation. Here’s What Changes for Teams

DeepL announced real-time voice translation for Zoom and Microsoft Teams. Unlike existing solutions, it builds on DeepL's text translation strength — direct translation models with lower latency. Here's why this matters and where it breaks.

· 3 min read
10 Free AI Tools That Actually Pay for Themselves in 2026
AI Tools Directory

10 Free AI Tools That Actually Pay for Themselves in 2026

Ten free AI tools that actually replace paid SaaS in 2026: Claude, Perplexity, Llama 3.2, DeepSeek R1, GitHub Copilot, OpenRouter, HuggingFace, Jina, Playwright, and Mistral. Each tested across real workflows with realistic rate limits, accuracy benchmarks, and cost comparisons.

· 9 min read
Copilot vs Cursor vs Windsurf: Which IDE Assistant Actually Works
AI Tools Directory

Copilot vs Cursor vs Windsurf: Which IDE Assistant Actually Works

Three coding assistants dominate 2026. Copilot stays safe for enterprises. Cursor wins on speed and accuracy for most developers. Windsurf's agent mode actually executes code to prevent hallucinations. Here's how to pick.

· 4 min read
AI Tools That Actually Cut Hours From Your Week
AI Tools Directory

AI Tools That Actually Cut Hours From Your Week

I tested 30 AI productivity tools across writing, coding, research, and operations. Only 8 actually saved measurable time. Here's which tools have real ROI, the workflows where they win, and why most "AI productivity tools" fail.

· 12 min read
Google’s AI Watermarking System Reportedly Cracked. Here’s What It Means
AI News

Google’s AI Watermarking System Reportedly Cracked. Here’s What It Means

A developer claims to have reverse-engineered Google DeepMind's SynthID watermarking system using basic signal processing and 200 images. Google disputes the claim, but the incident raises questions about whether watermarking can be a reliable defense against AI-generated content misuse.

· 3 min read

Stay ahead of the AI curve

Weekly digest of the most impactful AI breakthroughs, tools, and strategies. No noise, only signal.

Follow Prompt Builder Prompt Builder