One Model, Three Roles: How a 1.2B Model Plays Reasoner, Planner, and Solver

Most agentic AI systems assume you have access to multiple models. A smart one for planning. A cheap one for execution. Maybe a specialized one for code.
I have one model. 1.2 billion parameters. Running on a Raspberry Pi 5.
And it plays three completely different roles in the same pipeline.
The Setup
The voice keyboard I'm building runs AI locally. No cloud calls. Everything happens on an $80 computer. To make it capable of multi-step reasoning, I built a pipeline based on the ReWOO architecture (background on why):
Reasoner → Planner → Worker → Solver
Four phases. One model. Same weights, different prompts.
The Three Roles
Here's what each role does:
Reasoner - Chain-of-thought analysis. "What is this user actually asking? What would a complete answer require?"
Planner - Tool routing. "What steps and tools are needed? Output a numbered plan."
Solver - Synthesis. "Given the evidence from the tools, write the final response."
The Worker phase executes tools (web search, calculator, memory lookup) but doesn't call the LLM for routing - it just runs code.
Three LLM calls total. Each one uses the same model with a different system prompt.
The Prompts
This is where it gets interesting. The same 1.2B model behaves completely differently based on how you frame its role.
Reasoner:
You are a THOUGHTFUL ANALYZER. Today is {today}.
CALIBRATE YOUR THINKING:
- Simple question (weather, math) → Keep it simple. Just identify what to do.
- Complex question (plans, analysis) → Think deeper.
FORMAT:
Think: [analysis]
Requirements: [what needs to be done]
Planner:
You create tool call plans. Output ONLY numbered tool calls.
TOOLS:
- Calculate(expr="...") - math AND dates
- Lookup(query="...") - web search
- Remember(query="...") - search past conversations
- LLM(prompt="...") - creative tasks, analysis
FORMAT:
1. ToolName(param="value")
Solver:
You are a helpful AI assistant.
Your job: provide a thorough, helpful response using the evidence.
HOW TO RESPOND:
1. TRUST THE EVIDENCE. The tools already ran - use what they found.
2. Be direct and confident. Don't hedge.
3. Don't just summarize - ADD VALUE.
Notice the personality shift. The Reasoner is analytical. The Planner is terse and structured. The Solver is conversational and confident.
Same weights. Different personas.
Why This Works
Three things make this possible:
1. KV Cache Reset
Between each phase, I reset the model's KV cache:
self.llm.reset()This gives each role a clean slate. The Reasoner's chain-of-thought doesn't bleed into the Planner's structured output.
2. Strong Format Constraints
Small models need explicit formatting rules. The Planner prompt specifies exact output format (1. Tool(param="value")). The Solver prompt uses numbered rules.
Without these constraints, the model drifts. It starts planning in the middle of solving. It adds commentary when you need just tool calls.
3. Role Separation > Role Complexity
Each prompt does one thing. The Reasoner analyzes. It doesn't plan. The Planner outputs steps. It doesn't explain why. The Solver synthesizes. It doesn't second-guess the evidence.
Trying to combine roles ("analyze, then plan, then respond") creates confusion. Separating them creates clarity. This lines up with Decomposed Prompting (Khot et al., 2022) — their research showed that breaking a complex task into sub-tasks handled by different specialized prompts consistently outperforms a single monolithic prompt. They call them "decomposers" and "sub-task handlers." I call them Reasoner, Planner, and Solver. Same idea.
The Calibration Insight
One pattern appears in all three prompts: calibration.
CALIBRATE YOUR THINKING:
- Simple question → Keep it simple.
- Complex question → Think deeper.
Small models have a tendency to over-elaborate. Ask "what's 15% of 847?" and you might get a philosophical treatise on percentages.
The calibration line tells the model: match your output to the input complexity. It works surprisingly well.
What I Tried That Didn't Work
Combining Reasoner and Planner. "First analyze the request, then output a plan."
The model would start planning halfway through its analysis. Or it would analyze indefinitely without ever producing a plan. Separation fixed this.
No KV cache reset. Let context accumulate across phases.
The Solver would start referencing the Reasoner's internal analysis instead of the actual tool results. Confusing and inconsistent.
Same prompt, different instructions. "Now you're in Planner mode."
The model couldn't cleanly shift gears. Having entirely different system prompts works better than role-switching mid-conversation.
The Flexible Pipeline
The architecture supports different combinations:
| Mode | Pipeline | Use Case |
|---|---|---|
| Full | Reasoner → Planner → Worker → Solver | Complex queries with tool use |
| No Tools | Reasoner → Solver | Open-ended discussion |
| Direct | Planner → Worker → Solver | Simple tool queries |
This flexibility matters. "What's the weather?" doesn't need reasoning - go straight to Planner. "Help me think through this decision" doesn't need tools - go Reasoner → Solver.
The same model handles all of these. Just different subsets of the pipeline.
The Numbers
On a Raspberry Pi 5 with 8GB RAM:
| Phase | Time | Token Budget |
|---|---|---|
| Reasoner | 3-8s | ~500 tokens |
| Planner | 2-5s | ~200 tokens |
| Worker | varies | (tool execution) |
| Solver | 5-15s | ~1000 tokens |
Total for a 3-step reasoning task: 15-30 seconds.
Not instant. But usable for questions that genuinely require thinking.
Why Not Use Multiple Models?
The obvious question. Systems like ReAct assume you can run interleaved think-act-observe loops — each one calling the model again with the full context. On a Pi with a 1.2B model, that means quadratic token growth and multi-minute latencies. Separating the roles with ReWOO's plan-first approach keeps it to a fixed number of calls.
But beyond that, practical reasons:
- Memory. Each model needs RAM. One model = one memory footprint.
- Load time. Loading a model takes seconds. I load once at startup.
- Simplicity. One model path, one set of weights, one quantization format.
But there's a deeper reason: it works.
A 1.2B model with good prompts outperforms a 7B model with generic prompts on structured tasks. The prompts are doing the heavy lifting. The model I'm using — LFM2.5-1.2B-Instruct from Liquid AI — was designed for exactly this kind of on-device work. Their Thinking variant fits in under 1GB and handles reasoning natively. We later benchmarked it against the larger 2.6B model — the smaller one won. When your model is built for the constraint, structured prompting goes even further.
What I Learned
1. Prompts are personas. The same model can be analytical, structured, or conversational. It depends on how you frame its role.
2. Separation beats combination. Three clear roles work better than one confused role trying to do everything. Research backs this up.
3. Reset state between roles. KV cache bleeding causes weird behavior. Clean slate for each phase.
4. Calibration is underrated. "Match your output to the input complexity" is a simple instruction that prevents a lot of problems.
The Code
The implementation lives in a single file. The key class is ThinkingPipeline with three system prompts:
REASONER_SYSTEM_PROMPTPLANNER_SYSTEM_PROMPTSOLVER_SYSTEM_PROMPT
And the run() method that orchestrates the phases:
def run(self, user_query, enable_reasoning=True, enable_tools=True):
# Phase 1: Reasoner
if enable_reasoning:
reasoning = self._run_reasoner(user_query)
# Phase 2 & 3: Planner + Worker
if enable_tools:
steps = self._run_planner(user_query, reasoning)
evidence = self._run_worker(steps)
# Phase 4: Solver
response = self._run_solver_flexible(user_query, reasoning, steps, evidence)
return responseEach _run_* method resets the KV cache, builds its prompt, and calls the same self.llm instance.
One model. Three roles. Same weights.