Agent Memory Architecture: Why Documentation is Agent RAM

Agent Memory Architecture: Why Documentation is Agent RAM

This is Part 2 of the Agent Memory Architecture series. In Part 1, we established that a 100k-line codebase needs roughly 25,000 lines of codified context (a 24% ratio) to prevent agent amnesia. But you cannot just dump 25,000 lines into a single prompt. Here is how you actually organize it.

Documentation is no longer an onboarding tool for humans. It is load-bearing infrastructure. If you build with AI agents, your documentation functions as the external RAM for the model.

A recent study titled Codified Context demonstrates that scaling agents requires a memory management strategy. The developers used Claude Code to build a 108,000-line C# multiplayer simulation. To manage the required 25,000 lines of instructions without degrading the model's reasoning or exploding the context window, they split the context into a three-tier architecture.

Traditional narrative documentation is replaced by high-density, constraint-heavy files. If a rule is not explicitly codified in this system, it does not exist to the agent.

How to structure Agent Memory (The 3-Tier Model)

Effective agent memory requires a tiered approach that balances context window limits with retrieval accuracy. You organize the 25,000 lines explicitly in the repository:

/context
  /hot
    - constitution.md        (Loaded globally)
  /warm
    - network-expert.md      (Loaded on demand)
    - physics-expert.md      (Loaded on demand)
  /cold
    - binary-packet-spec.md  (Queried via MCP tool)

Tier 1: Hot Memory (The Global System Prompt)

The Hot tier represents the global instructions loaded for every single action the agent takes.

In the study, this was a strict 660-line markdown file (constitution.md) that remained in the context window at all times. It is the core operating system of your agent.

It does not contain feature logic. It contains orchestration protocols—the rules that tell the agent how to use tools and when to look things up. For example, it dictates global invariants: All systems must be stateless. If you lack context for a packet structure, call the MCP search tool immediately. It prevents the agent from drifting away from the project’s fundamental philosophy.

Tier 2: Warm Memory (Dynamic Domain Personas)

The Warm tier represents task-specific prompts injected into the context window only when the agent works on a relevant module.

Instead of one massive prompt, the researchers built 19 specialized domain-expert agents totaling 9,300 lines. These files are instruction sets for specific subsystems. When the agent identifies it is working on the networking stack, it loads the network-expert.md module.

You do not load the network expert when you are asking the agent to fix the rendering engine. These files dictate local boundaries (e.g., You are the Physics Expert. Never update Position directly; apply velocity). This keeps the reasoning path clean and prevents the model from hallucinating physics logic into a network packet.

Tier 3: Cold Memory (Static Data and Specs)

The Cold tier represents massive, static datasets that are never loaded into the prompt by default. They are queried on demand via Model Context Protocol (MCP) tools.

This is your external hard drive. In the study, this was a repository of 34 specification documents consisting of 16,250 lines of text.

A file like binary-packet-spec.md might contain thousands of lines of bit-offsets and hexadecimal definitions. The agent does not need to memorize this; it needs to be able to look it up. When the agent encounters a specific data structure it does not recognize, it uses a tool to search the Cold Memory, pulling only the 5 lines it needs to parse a single packet.

Why traditional documentation fails AI agents

Traditional documentation is explanatory and narrative-heavy, which leads to retrieval noise and hallucination. Agents do not need the history of a decision or a friendly introduction. They need explicit boundaries and direct constraint mapping.

Writing for MCP retrieval changes your formatting requirements entirely. Narrative paragraphs are useless. You need strict bullet points, explicit type definitions, and direct constraint mapping. You must think about how an MCP server chunks your text. If a rule and its exception are separated by three paragraphs of fluff, the retrieval step might grab the rule and leave the exception behind.

How to write high-density Agent RAM

To write documentation that functions as Agent RAM, strip out all pleasantries and focus on density. Every word in the file costs tokens and processing time, so every line must serve a functional purpose.

Start by stating the rule, defining the boundary, and providing the exact syntax required. Here is the difference between a human wiki and Agent RAM retrieved via MCP:

Human Documentation (Fails in AI Context):

### Network Synchronization
When replicating entity state over the network, make sure you pack the bytes efficiently. C# reflection is too slow for the game loop, so write the serialization manually.

Agent RAM (Optimized for MCP Retrieval):

domain: network_sync
system: monogame_ecs
rules:
  - trigger: "state_replication"
    action: "Serialize Component data directly into byte arrays. Never use C# Reflection."
    error_handling: "If payload exceeds MTU size, shard the update across multiple ticks."

Follow these three rules for Agent RAM:

  1. Use Answer-First headings. Every heading should name the specific problem it solves.
  2. Keep rules and exceptions in the same block. Ensure the retrieval chunk captures the full context.
  3. Use concrete numbers. Replace significantly faster with response time under 200ms.

When you map out the 25,000 lines across these three tiers, the context ceiling disappears. The agent never sees more than 1,500 lines of documentation at any given second, yet it operates with the intelligence of a system backed by 25,000 lines of rules.