One Model, Three Roles: How a 1.2B Model Plays Reasoner, Planner, and Solver

One Model, Three Roles: How a 1.2B Model Plays Reasoner, Planner, and Solver

Most agentic AI systems assume you have access to multiple models. A smart one for planning. A cheap one for execution. Maybe a specialized one for code.

I have one model. 1.2 billion parameters. Running on a Raspberry Pi 5.

And it plays three completely different roles in the same pipeline.

The Setup

The voice keyboard I'm building runs AI locally. No cloud calls. Everything happens on an $80 computer. To make it capable of multi-step reasoning, I built a pipeline based on the ReWOO architecture (background on why):

Reasoner → Planner → Worker → Solver

Four phases. One model. Same weights, different prompts.

The Three Roles

Here's what each role does:

Reasoner - Chain-of-thought analysis. "What is this user actually asking? What would a complete answer require?"

Planner - Tool routing. "What steps and tools are needed? Output a numbered plan."

Solver - Synthesis. "Given the evidence from the tools, write the final response."

The Worker phase executes tools (web search, calculator, memory lookup) but doesn't call the LLM for routing - it just runs code.

Three LLM calls total. Each one uses the same model with a different system prompt.

The Prompts

This is where it gets interesting. The same 1.2B model behaves completely differently based on how you frame its role.

Reasoner:

You are a THOUGHTFUL ANALYZER. Today is {today}.

CALIBRATE YOUR THINKING:
- Simple question (weather, math) → Keep it simple. Just identify what to do.
- Complex question (plans, analysis) → Think deeper.

FORMAT:
Think: [analysis]
Requirements: [what needs to be done]

Planner:

You create tool call plans. Output ONLY numbered tool calls.

TOOLS:
- Calculate(expr="...") - math AND dates
- Lookup(query="...") - web search
- Remember(query="...") - search past conversations
- LLM(prompt="...") - creative tasks, analysis

FORMAT:
1. ToolName(param="value")

Solver:

You are a helpful AI assistant.
Your job: provide a thorough, helpful response using the evidence.

HOW TO RESPOND:
1. TRUST THE EVIDENCE. The tools already ran - use what they found.
2. Be direct and confident. Don't hedge.
3. Don't just summarize - ADD VALUE.

Notice the personality shift. The Reasoner is analytical. The Planner is terse and structured. The Solver is conversational and confident.

Same weights. Different personas.

Why This Works

Three things make this possible:

1. KV Cache Reset

Between each phase, I reset the model's KV cache:

self.llm.reset()

This gives each role a clean slate. The Reasoner's chain-of-thought doesn't bleed into the Planner's structured output.

2. Strong Format Constraints

Small models need explicit formatting rules. The Planner prompt specifies exact output format (1. Tool(param="value")). The Solver prompt uses numbered rules.

Without these constraints, the model drifts. It starts planning in the middle of solving. It adds commentary when you need just tool calls.

3. Role Separation > Role Complexity

Each prompt does one thing. The Reasoner analyzes. It doesn't plan. The Planner outputs steps. It doesn't explain why. The Solver synthesizes. It doesn't second-guess the evidence.

Trying to combine roles ("analyze, then plan, then respond") creates confusion. Separating them creates clarity. This lines up with Decomposed Prompting (Khot et al., 2022) — their research showed that breaking a complex task into sub-tasks handled by different specialized prompts consistently outperforms a single monolithic prompt. They call them "decomposers" and "sub-task handlers." I call them Reasoner, Planner, and Solver. Same idea.

The Calibration Insight

One pattern appears in all three prompts: calibration.

CALIBRATE YOUR THINKING:
- Simple question → Keep it simple.
- Complex question → Think deeper.

Small models have a tendency to over-elaborate. Ask "what's 15% of 847?" and you might get a philosophical treatise on percentages.

The calibration line tells the model: match your output to the input complexity. It works surprisingly well.

What I Tried That Didn't Work

Combining Reasoner and Planner. "First analyze the request, then output a plan."

The model would start planning halfway through its analysis. Or it would analyze indefinitely without ever producing a plan. Separation fixed this.

No KV cache reset. Let context accumulate across phases.

The Solver would start referencing the Reasoner's internal analysis instead of the actual tool results. Confusing and inconsistent.

Same prompt, different instructions. "Now you're in Planner mode."

The model couldn't cleanly shift gears. Having entirely different system prompts works better than role-switching mid-conversation.

The Flexible Pipeline

The architecture supports different combinations:

ModePipelineUse Case
FullReasoner → Planner → Worker → SolverComplex queries with tool use
No ToolsReasoner → SolverOpen-ended discussion
DirectPlanner → Worker → SolverSimple tool queries

This flexibility matters. "What's the weather?" doesn't need reasoning - go straight to Planner. "Help me think through this decision" doesn't need tools - go Reasoner → Solver.

The same model handles all of these. Just different subsets of the pipeline.

The Numbers

On a Raspberry Pi 5 with 8GB RAM:

PhaseTimeToken Budget
Reasoner3-8s~500 tokens
Planner2-5s~200 tokens
Workervaries(tool execution)
Solver5-15s~1000 tokens

Total for a 3-step reasoning task: 15-30 seconds.

Not instant. But usable for questions that genuinely require thinking.

Why Not Use Multiple Models?

The obvious question. Systems like ReAct assume you can run interleaved think-act-observe loops — each one calling the model again with the full context. On a Pi with a 1.2B model, that means quadratic token growth and multi-minute latencies. Separating the roles with ReWOO's plan-first approach keeps it to a fixed number of calls.

But beyond that, practical reasons:

  1. Memory. Each model needs RAM. One model = one memory footprint.
  2. Load time. Loading a model takes seconds. I load once at startup.
  3. Simplicity. One model path, one set of weights, one quantization format.

But there's a deeper reason: it works.

A 1.2B model with good prompts outperforms a 7B model with generic prompts on structured tasks. The prompts are doing the heavy lifting. The model I'm using — LFM2.5-1.2B-Instruct from Liquid AI — was designed for exactly this kind of on-device work. Their Thinking variant fits in under 1GB and handles reasoning natively. We later benchmarked it against the larger 2.6B model — the smaller one won. When your model is built for the constraint, structured prompting goes even further.

What I Learned

1. Prompts are personas. The same model can be analytical, structured, or conversational. It depends on how you frame its role.

2. Separation beats combination. Three clear roles work better than one confused role trying to do everything. Research backs this up.

3. Reset state between roles. KV cache bleeding causes weird behavior. Clean slate for each phase.

4. Calibration is underrated. "Match your output to the input complexity" is a simple instruction that prevents a lot of problems.

The Code

The implementation lives in a single file. The key class is ThinkingPipeline with three system prompts:

  • REASONER_SYSTEM_PROMPT
  • PLANNER_SYSTEM_PROMPT
  • SOLVER_SYSTEM_PROMPT

And the run() method that orchestrates the phases:

def run(self, user_query, enable_reasoning=True, enable_tools=True):
    # Phase 1: Reasoner
    if enable_reasoning:
        reasoning = self._run_reasoner(user_query)
    
    # Phase 2 & 3: Planner + Worker
    if enable_tools:
        steps = self._run_planner(user_query, reasoning)
        evidence = self._run_worker(steps)
    
    # Phase 4: Solver
    response = self._run_solver_flexible(user_query, reasoning, steps, evidence)
    return response

Each _run_* method resets the KV cache, builds its prompt, and calls the same self.llm instance.

One model. Three roles. Same weights.