
What makes OpenClaw powerful is surprisingly simple: it's a gateway that connects an AI agent to your messaging apps, gives it tools to interact with your computer, and lets it remember who you are across conversations.
The complexity comes from handling multiple channels simultaneously, managing persistent sessions, coordinating multiple agents, and making the whole thing reliable enough to run 24/7 on your machine.
In this post, I'll start from scratch and build up to OpenClaw's architecture step by step, showing how you could have invented it yourself from first principles, using nothing but a messaging API, an LLM, and the desire to make AI actually useful outside the chat window.
End goal: understand how persistent AI assistants work, so you can build your own.
First, let's establish the problem
When you use ChatGPT or Claude in a browser, there's a fundamental limitation: the AI lives behind a web interface. You go to it. It doesn't come to you.
Think about how you actually communicate day-to-day. You use WhatsApp, Telegram, Discord, Slack, iMessage. Your conversations happen there, not in some separate AI tab.
What if the AI just... lived in your messaging apps? What if you could text it like a friend and it could:
Remember your preferences and past conversations
Run commands on your computer
Browse the web for you
Send messages on your behalf
Wake up on a schedule and do recurring tasks
This is what OpenClaw does. It's an AI that lives where you already communicate, with access to your tools and your context.
Let's build one from scratch.
The Simplest Possible Bot
Let's start with the absolute minimum: an AI that responds to messages on Telegram.
Run it, send a message on Telegram, and the AI responds. Simple.
But this is basically a worse version of the Claude web interface. Every message is independent. No memory. No tools. No personality.
What if we gave it memory?
Goal: Persistent Sessions
A problem with our simple bot is statelessness. Every message is a fresh conversation. Ask it "what did I say earlier?" and it has no idea.
The fix is sessions. Keep a conversation history per user.
Now you can have an actual conversation:
The key insight is the JSONL format. Each line is one message. Append-only. If the process crashes mid-write, you lose at most one line. This is exactly what OpenClaw uses for session transcripts:
Each session maps to a file. Each file is a conversation. Restart the process and everything is still there.
But we'll hit a problem: conversations grow. Eventually they'll exceed the model's context window. We'll come back to that.
Goal: Adding a Personality (SOUL.md)
Our bot works, but it has no personality. It's a generic AI assistant. What if we wanted it to be someone?
OpenClaw solves this with SOUL.md: a markdown file that defines the agent's identity, behavior, and boundaries.
Now instead of a generic assistant, you're talking to Jarvis. The SOUL gets injected as the system prompt on every API call.
In OpenClaw, the SOUL.md lives in the agent's workspace:
It gets loaded at session start and injected into the system prompt. You can write anything you want in there. Give the agent an origin story. Define its core philosophy. List its behavioral rules.
The more specific your SOUL, the more consistent the agent's behavior. "Be helpful" is vague. "Be the assistant you'd actually want to talk to. Concise when needed, thorough when it matters. Not a corporate drone. Not a sycophant. Just... good." - that gives the model something to work with.
Goal: Adding Tools
A bot that can only talk is limited. What if it could do things?
The core idea: give the AI structured tools and let it decide when to use them.
Now we need the agent loop. When the AI wants to use a tool, we execute it and feed the result back:
Now we update handle_message to use the agent loop instead of calling the API directly:
Now you can text your bot:
The AI decided which tools to use, in what order, and synthesized the results into a natural response. All through a Telegram message.
OpenClaw's production tool catalog is much larger - browser automation, inter-agent messaging, sub-agent spawning, and more - but every tool follows this exact pattern: a schema, a description, and an execution function.
Goal: Permission Controls
We're executing commands from Telegram messages. That's terrifying. What if someone gets access to your Telegram account and tells the bot to rm -rf /?
We need a permission system. OpenClaw's approach: an approval allowlist that remembers what you've approved.
We add these helpers alongside our existing code, then update the run_command case inside execute_tool to use them:
Now update the run_command case in execute_tool to check permissions before executing:
When a command is safe or previously approved, it runs immediately. When it's not, the agent gets told "permission denied" and can try a different approach. The approval gets persisted to exec-approvals.json, so you're never asked twice for the same command.
OpenClaw extends this with glob patterns (approve git * once) and a three-tier model: "ask" (prompt user), "record" (log but allow), and "ignore" (auto-allow).
Goal: The Gateway
Here's where it gets interesting. So far we have a Telegram bot. But what if you also want the AI on Discord? And WhatsApp? And Slack?
You could write separate bots for each platform. But then you'd have separate sessions, separate memory, separate configurations. The AI on Telegram wouldn't know what you discussed on Discord.
The solution: a gateway. One central process that manages all channels.
Look at what we already have. Our run_agent_turn function doesn't know anything about Telegram. It takes messages and returns text. That's the key - the agent logic is already decoupled from the channel.
To prove it, let's add a second interface. We'll add a simple HTTP API alongside our Telegram bot, both talking to the same agent and the same sessions:
Try it out: Tell the bot your name on Telegram. Then query via HTTP using the same user ID (your Telegram user ID) to prove the session is shared:
Same agent, same sessions, same memory. Two different interfaces. That's the gateway pattern.
The next step would be making this config-driven - a JSON file specifying which channels to start and how to authenticate them.
That's what OpenClaw does: its gateway manages Telegram, Discord, WhatsApp, Slack, Signal, iMessage, and more, all through a single config file. It also supports configurable session scoping - per-user, per-channel, or a single shared session - so the same person gets a unified experience across channels. We'll keep our simple user-ID-as-session-key approach for now.
Goal: Context Compaction
Remember the growing session problem we flagged earlier? After chatting with your bot for weeks, the session file has thousands of messages. The total token count exceeds the model's context window. Now what?
The fix: summarize old messages, keep recent ones. Add these two functions alongside your existing code:
Now add the compaction check at the top of handle_message, right after loading the session:
Try it out: To test compaction without chatting for hours, temporarily lower the threshold:
Have a conversation of 10-15 messages, then watch the old messages get replaced with a summary. The bot still remembers key facts, but the session file is much smaller.
OpenClaw's compaction is more sophisticated - it splits messages into chunks by token count, summarizes each chunk separately, and includes a safety margin for estimation inaccuracy - but the core idea is identical.
Goal: Long-Term Memory
Session history gives you conversation memory. But what happens when you reset a session or start a new one? Everything is gone.
We need a separate memory system - persistent knowledge that survives session resets. The approach: give the agent tools to save and search memories stored as files.
Add these two tools to the TOOLS list:
Add their cases to execute_tool:
Finally, update the SOUL so the agent knows about memory:
Try it out:
The memory persists because it's stored in files, not in the session. Reset the session, restart the bot - the memories are still there.
OpenClaw's production memory uses vector search with embeddings for semantic matching (so "auth bug" matches "authentication issues"), but our keyword search works well for getting started.
Goal: Command Queue
Here's a subtle but critical problem: what happens when two messages arrive at the same time?
Say you send "check my calendar" on Telegram and "what's the weather" via the HTTP API simultaneously. Both try to load the same session, both try to append to it, and you get corrupted data.
The fix is simple: a per-session lock. Only one message processes at a time for each session. Different sessions can still run in parallel.
Now wrap the body of handle_message with the lock:
Do the same for the /chat HTTP endpoint:
That's it - five lines of setup. Messages for the same user queue up. Messages for different users run in parallel. No race conditions.
OpenClaw extends this with lane-based queues (separate lanes for messages, cron jobs, and sub-agents) so heartbeats never block real-time conversations.
Goal: Cron Jobs (Heartbeats)
So far our agent only responds when you talk to it. But what if you want it to check your email every morning? Or summarize your calendar before meetings?
You need scheduled execution. Let's add heartbeats - recurring tasks that trigger the agent on a timer.
The key insight: each heartbeat uses its own session key (cron:morning-briefing). This keeps scheduled tasks from cluttering your main conversation history. The heartbeat calls the same run_agent_turn function - it's just another message, triggered by a timer instead of a human.
Try it out: For testing, change the schedule to run every minute:
You'll see the heartbeat fire in your terminal, and the agent will respond. Change it back to a daily schedule when you're done testing.
OpenClaw supports full cron expressions (30 7 * * *) and routes heartbeats through a separate command queue lane so they never block real-time messages.
Goal: Multi-Agent
One agent is useful. But as you add more tasks, you'll find a single personality and toolset can't cover everything well. A research assistant needs different instructions than a general assistant.
The fix: multiple agent configurations with routing. Each agent has its own SOUL, its own session, and you switch between them based on the message.
Update handle_message to route messages to the right agent:
Try it out:
Each agent has its own conversation history, but they share the same memory directory. Scout saves research findings; Jarvis can search for them later. They collaborate through shared files without needing direct messaging.
OpenClaw extends this with sub-agent spawning (a parent agent can spawn a child for a focused task) and inter-agent messaging, but the core pattern is the same: each agent is just a SOUL + session + tools.
Putting It All Together
Let's combine everything we've built into a single runnable script. This is a clean standalone REPL that includes every feature from the tutorial: sessions, SOUL, tools, permissions, compaction, memory, command queue, cron, and multi-agent routing.
I've put together a mini-openclaw in ~400 lines of code here:
https://gist.github.com/dabit3/86ee04a1c02c839409a02b20fe99a492
Save this as mini-openclaw.py and run it:
Here's what a session looks like:
The memory persists across sessions. Agents collaborate through shared memory files. Commands require approval. Heartbeats run in the background. All in ~400 lines.
What We've Learned
Starting from a simple Telegram bot, we built every major component of a persistent AI assistant:
Persistent sessions (JSONL files): Crash-safe conversation memory. Each session is one file, each line is one message. Restart the process and everything is still there.
SOUL.md (system prompt): A personality file that transforms a generic AI into a specific agent with consistent behavior, boundaries, and style.
Tools + Agent loop: Structured tool definitions that let the AI decide when to act. The agent loop calls the LLM, executes any requested tools, feeds results back, and repeats until done.
Permission controls: An allowlist of safe commands plus persistent approvals, so dangerous operations require explicit consent.
The gateway pattern: One central agent with multiple interfaces. Telegram, HTTP, or any other channel - they all talk to the same sessions and the same memory.
Context compaction: When conversations outgrow the context window, summarize old messages and keep recent ones. The bot keeps its knowledge without hitting token limits.
Long-term memory: File-based storage with save and search tools. Knowledge that survives session resets, accessible to any agent.
Command queue: Per-session locking to prevent race conditions when multiple messages arrive simultaneously.
Heartbeats: Scheduled agent runs on a timer, each with its own isolated session. The agent wakes up, does its task, and goes back to sleep.
Multi-agent routing: Multiple agent configurations with different SOULs and session keys, routed by message content. Agents collaborate through shared memory files.
Each of these emerged from a practical problem:
"The AI can't remember anything" → Sessions
"It responds like a generic chatbot" → SOUL.md
"It can only talk, not act" → Tools + Agent loop
"It runs dangerous commands without asking" → Permission controls
"I want it on all my messaging apps" → Gateway
"The conversation got too long" → Compaction
"It forgets things between sessions" → Memory
"Two messages at once corrupt the data" → Command queue
"I want it to do things automatically" → Heartbeats
"One agent can't do everything well" → Multi-agent
This is how you could have invented OpenClaw.
Going Further
Our prototype covers the core architecture. Here's how OpenClaw extends each idea for production use - features worth exploring once you've outgrown the basics.
Browser with Semantic Snapshots
Most AI assistants can't see the web. OpenClaw gives the agent a browser via Playwright, but instead of sending screenshots (5MB each, expensive in tokens), it uses semantic snapshots - a text representation of the page's accessibility tree:
Each interactive element gets a numbered ref ID. When the agent wants to click something, it says "click ref=1" - which maps to exactly one element on the page. No guessing, no "click the blue button near the top." And since the snapshot is text instead of an image, it's roughly 100x smaller than a screenshot, which means far fewer tokens per page.
Session Scoping & Identity Links
Our prototype uses user ID as the session key. OpenClaw supports configurable scoping:
main (default): All DMs share one session — simple, great for single-user setups.
per-peer: Each person gets one session across all channels.
per-channel-peer: Each person per channel gets their own session.
Identity links let you merge sessions across channels for the same person, so Alice's Telegram and Discord conversations share the same history.
Channel Plugin System
Our prototype hardcodes Telegram + HTTP. OpenClaw uses a plugin architecture where each channel (Telegram, Discord, WhatsApp, Slack, Signal, iMessage) is a separate adapter that normalizes messages into a common format. Adding a new channel means writing one adapter, not touching any agent logic.
Vector Memory Search
Our keyword search works, but misses semantic matches ("auth bug" won't match "authentication issues"). OpenClaw's production memory uses a hybrid approach: vector search via SQLite with embedding extensions for semantic similarity, plus FTS5 for exact keyword matches. Configurable embedding providers include OpenAI, local models, Gemini, and Voyage.
Sub-agent Spawning
Our multi-agent setup uses manual routing. OpenClaw lets agents spawn sub-agents programmatically - a parent agent calls sessions_spawn, the child runs in its own context with a timeout, and returns results to the parent. This enables patterns like "research this topic in depth" where the main agent delegates to a specialist and continues when it's done.
Next Steps
If you want to build your own:
Start with one channel: get a Telegram or Discord bot working with sessions
Add tools incrementally: start with file read/write, then add shell execution
Add memory when you need it: once sessions reset, you'll want persistent memory
Add channels when you outgrow one: the gateway pattern emerges naturally
Add agents when tasks specialize: don't start with 10 agents, start with 2
Or just use OpenClaw. It's open source and handles all the edge cases we glossed over. But now you know how it works under the hood.
Link: http://x.com/i/article/2021347850656022528