Gemini Agentic Workflows: Solving Model Laziness with Skill Graphs

I spent a week trying to figure out why Gemini kept skipping the most important parts of my writing workflow.

I had done what you are supposed to do: I broke my monolithic system prompt into separate, modular markdown files. I created one file for drafting, one for editing, one for formatting, and one for social distribution. It was a clean directory of skills for the agent to use.

The problem? The relationship between these skills was only implied.

I would ask the agent to write a post, and it would immediately spit out a finished LinkedIn thread. It completely ignored the data enrichment and audit steps sitting right there in the directory. It wasn't failing to run. It was just taking shortcuts.

Why LLMs skip steps in complex workflows

Models skip steps because reasoning accuracy drops as the number of available tools increases. Research on LLM tool selection shows that once you cross 10–12 tools, models start making sub-optimal choices.

If you hand an agent a flat menu of 15 tools, it will eventually reason its way into a shortcut to save tokens. Ask it to write a researched article, and it might decide the search tool isn't necessary because it already knows the answer. It skips the hard work and calls it a day.

You can try to fix this by tightening the leash with strict, deterministic rules like always do X, then Y, then Z. But if you tighten the leash too much, the model breaks. You strip away the contextual reasoning that makes an LLM useful. You end up with a brittle script that fails the second it hits an edge case.

You don't need a strict track. You need a loose leash.

Architecture: Using Skill Graphs to limit agent tool visibility

The Skill Graph architecture solves tool-selection errors by isolating logic into three distinct spaces: Self, Skills, and Ops. This limits the agent's visibility to only the tools relevant to its current state.

I restructured the system into a directory-based graph. By separating the agent's identity from its capability and its current task, you prevent the model from seeing tools it doesn't need yet.

├── self/                # Identity & Constraints
│   ├── identity.md      # Who the agent is
│   └── voice.md         # How it speaks
├── skills/              # The Capability Graph
│   ├── _content-hub.md  # The Map of Content (Entry Point)
│   ├── drafting.md      # Logic for initial prose
│   └── audit.md         # Logic for fact-checking
└── ops/                 # State & Ledger
    ├── queue.md         # Current task list
    └── session-logs/    # Metadata for feedback loops

This three-space architecture forces a separation of concerns. The agent starts with its identity, checks the ledger to see where it left off, and only then navigates to the specific skill node required for the next step.

Instead of loading every tool into the context window at once, the agent relies on progressive disclosure. It starts at an index node. It cannot see the distribution tools until it successfully navigates through the drafting and auditing nodes.

The graph restricts options based on the agent's current location. Each skill file contains a Next Steps section with wikilinks to downstream nodes. To make this work, the system prompt includes a simple directive: Before performing any action, read the current node and identify the allowed downstream transitions. You cannot call a tool unless it is linked in your current node.

The agent has total freedom to reason within the drafting node, but it lacks the visibility to skip ahead to distribution until it reaches the audit node. It is a physical boundary for the model’s attention.

Solving agent amnesia with a markdown state ledger

Moving the agent's state into a file-system ledger like ops/tasks/queue.md eliminates session amnesia by providing a hard-coded ground truth outside the model's volatile context window.

The graph fixed the skipped steps, but it didn't fix the memory drop-off. If a session dropped or I paused the work, the model would forget where we left off. The fix was moving the workflow out of the model's head and into the file system.

Now, at the start of every session, the agent reads a state queue. It acts as a ledger. Instead of relying on a fragile conversation history, it reads a concrete markdown checklist showing that enrichment is done and the audit is pending. It sees exactly what happened and what needs to happen next.

The file system becomes the ground truth. The LLM just executes the next logical step on the ledger.

The Tradeoff: Latency vs. Reliability

Moving from a flat prompt to a skill graph introduces two specific costs:

Token Overhead: Every time the agent moves to a new node, you are swapping context. You will spend more tokens on navigation than you would in a monolithic prompt.
Latency: Each node transition is essentially a new thought cycle. If you need a sub-second response for a simple chat, a graph is overkill.

But for complex, multi-step workflows, reliability is the leading metric. I will take a 10-second delay if it means the agent actually completes the audit instead of hallucinating that the work is done.

Deconstructing the Monolith

Moving from a flat directory to a living graph requires a fundamental shift in how the orchestrator discovers what it can do.

If you are still packing workflow rules into a single system prompt, you have likely hit the ceiling where the model starts ignoring instructions. Following my logic on why we stopped forcing one model to do everything, I stopped forcing a single prompt to hold the entire world.

Here is the practical path to deconstructing the monolith:

Adopt the Three-Space Architecture: Separate your agent's brain into self for identity, skills for capability, and ops for current state.
Wire it with Wikilinks: Instead of giving the agent a flat array of JSON tool schemas, give it Maps of Content. A domain map uses contextual wikilinks to show the agent why it should navigate to a specific skill.
Build a Feedback Loop: Use your ops folder to log session metadata. You can then use commands to mine these logs for patterns. If the agent repeatedly fails a specific API call, it can learn the fix and create a new node automatically.

In my build logs, moving from a flat directory of 15 tools to this structured graph reduced skipped-step errors by 60%. Monolithic prompts break under complexity and flat directories invite laziness. The Skill Graph offers a repeatable playbook for multi-step agentic work that maintains focus without sacrificing reasoning.

Why LLMs skip steps in complex workflows

Architecture: Using Skill Graphs to limit agent tool visibility

Solving agent amnesia with a markdown state ledger

The Tradeoff: Latency vs. Reliability

Deconstructing the Monolith

Related posts

Agent Memory Architecture: Why Documentation is Agent RAM

Agent AI Execution: Why Unix Shell Beats JSON Function Calling

AI Agent Architecture: How to Build 'Identity' Without Breaking Performance

Multi-Agent System Design: Moving From Prompts to Contracts