Multi-Agent System Design: Moving From Prompts to Contracts

In Part 1, we looked at how agents find each other via a skills graph. But discovery is only half the battle. Once an orchestrator locates a sub-agent, the focus shifts from finding the tool to governing the execution.

TL;DR: Multi-agent systems fail in production when they rely on natural language for handoffs. To build reliable systems, you must replace "polite requests" in system prompts with deterministic schemas like Pydantic or Structured Outputs. The contract—not the prompt—must be the only source of truth.

Right now, most developers handle multi-agent handoffs with vibes. They pass a giant system prompt to a secondary agent and hope it follows the instructions. This works for weekend projects but fails in high-stakes production environments. You cannot run a scalable system on "please do not leak data." You need a shift from prompts to contracts.

Why the orchestrator is always accountable (DeepMind research)

The core of safe delegation is the separation of responsibility from accountability. Google DeepMind recently published a paper titled Intelligent Delegation that outlines this framework for agentic systems.

If your orchestrator hands a medical record to a scrubbing agent and that sub-agent fails to remove a patient name, the sub-agent is responsible for the error. However, the orchestrator is accountable to the user. The orchestrator cannot blame the sub-agent because it chose to delegate the task and defined the parameters.

To manage this accountability, the orchestrator needs a deterministic way to box the sub-agent in.

The 5-step loop for safe agent delegation

The DeepMind research frames safe handoffs through a specific five-step cycle. If you are building an agentic system, this is your execution playbook.

1. Decompose

Never hand an open-ended goal to a sub-agent. Break the task down to its smallest component. Do not ask an agent to "clean up a user profile." Ask it to "extract and mask all 9-digit numerical strings." Bounded tasks limit the blast radius.

2. Assess Capability

Before the handoff, the orchestrator checks agent credentials via the skills graph. Does this sub-agent have a track record of parsing JSON correctly? Has it been audited for this specific data type?

3. Risk Assessment

Determine the stakes before execution. If an agent is formatting a public blog post, the risk is zero. If it is executing a SQL command or parsing financial records, the risk is massive. High-risk tasks require the strictest contracts.

4. Allocate and Contract

Stop using natural language to define the rules of engagement. Use hard contracts. Define explicit input and output schemas using Pydantic and force the sub-agent to use Structured Outputs.

Consider a standard Pydantic model for a scrubbing agent:

class ScrubbedData(BaseModel):
    original_id: UUID
    clean_text: str
    redacted_fields: List[str]
    confidence_score: float = Field(ge=0.95) # Enforce a 95% threshold

If the sub-agent returns a string instead of the ScrubbedData object, or if the confidence_score drops to 0.90, the orchestrator rejects the payload immediately. No interpretation allowed.

5. Monitor

The orchestrator must watch the output and log the exact inputs, outputs, and latency. This audit trail updates the skills graph. If a sub-agent consistently fails validation, its reputation score drops until it is effectively de-listed from the discovery layer.

Scaling trust on the open agentic web

The delegation loop works in a closed environment where you control every agent. But on the open web, your orchestrator will eventually hire a parsing agent built by a third party. You cannot enforce a contract through shared code alone.

This is where cryptoeconomic security and autonomous payment layers enter the picture. In a decentralized network, contracts require stakes. Protocols like Skyfire and Olas are building the infrastructure where agents have a wallet and a reputation at stake.

If a third-party agent wants to bid on your data scrubbing job, it provides a cryptographic commitment. If its output fails the Pydantic validation or violates the schema, the orchestrator refuses to sign the transaction for payment. Financial penalties replace API rate limits, creating a market where agents are economically incentivized to be honest.

3 steps to secure your agentic workflows today

Stop treating sub-agents like smart interns. Treat them like untrusted microservices. If you are passing plain text instructions between agents, you have a structural vulnerability.

  1. Schema-first design: Define the BaseModel for every handoff before you write a single line of the system prompt.
  2. Deterministic rejection: Build a validation wrapper. If the sub-agent output does not fit the schema, throw an error, log the failure, and retry or kill the job.
  3. Decouple the wallet: Look into agent-to-agent payment layers. Start planning for a future where your orchestrator pays for specialized compute without a human in the loop.

Build the contract first and let the agents figure out how to fulfill it. This is the only path from experiments to production-grade systems.