Qwen 3.5 vs Liquid LFM: Edge AI Benchmarks on Raspberry Pi 5

For months, Liquid LFM was the ceiling for edge hardware. Running a 1.2B parameter model on a Raspberry Pi 5 with context-aware reasoning felt like the limit for local AI. Qwen 3.5 turned that ceiling into a baseline.

In head-to-head edge testing, Qwen 3.5-2B scores 73.4 on GPQA compared to Liquid LFM2.5-1.2B at 38.9. The Qwen model requires 1.4GB of RAM (Q4_K_M) versus Liquid’s 900MB. While Qwen leads in complex reasoning and native vision, Liquid remains the superior choice for latency-constrained CPU inference like voice assistants.

The 1B Class Shift: Why Qwen 3.5 Changes the SLM Baseline

I previously wrote about running Liquid LFM-1.2B on a Pi 5. It proved you could get legitimate utility out of a model that fits in under a gigabyte of RAM. Running it through llama.cpp, it maintained a solid 12 tokens per second, behaved predictably, and left enough thermal headroom that the board didn't melt.

The Qwen 3.5 small series (0.8B to 9B) changes the math for sub-3B models. They didn't just prune a 70B model until it fit on a flash drive. They changed how small models handle memory. By baking DeltaNet and scaled reinforcement learning directly into the sub-3B class, they built a tiny model that actually pauses to plan before it outputs text.

Qwen 3.5 Architecture: DeltaNet and Scaled Reinforcement Learning

Three architectural shifts make Qwen 3.5 worth testing on edge hardware.

Native multimodality. Most small vision models just bolt a CLIP encoder onto a text model. Qwen built vision directly into the base architecture. You can pass a base64 image string to a 2B model on a Pi and get accurate spatial understanding back without loading a separate vision projector.

DeltaNet architecture. This swaps standard attention mechanisms for constant memory attention. Your KV cache stays flat as the conversation grows. You can theoretically push a massive context window on an 8GB Raspberry Pi without triggering an out-of-memory crash. Generation speed drops to a crawl long before you hit that limit, but the process survives.

Scaled reinforcement learning. Applying RL to tiny models usually causes them to overfit or collapse into repetitive garbage. Qwen stabilized it. This gives a 2B model the internal planning phase you usually only see in 30B+ parameters.

Benchmarks: Liquid LFM2.5 vs Qwen 3.5-2B Performance

I put Liquid LFM2.5-1.2B-Thinking up against Qwen 3.5-2B-Instruct. Both were quantized to Q4_K_M formats and run purely on the Pi 5 CPU using llama.cpp.

# Running Qwen 3.5 on the Pi 5
llama-cli -m models/qwen3.5-2b-instruct-q4_k_m.gguf \
  -c 4096 \
  --threads 4 \
  -p "Explain the difference between TCP and UDP like I'm 5."
MetricLiquid LFM2.5-1.2BQwen 3.5-2B
GPQA (Intelligence)38.973.4
IFBench (Following Rules)44.859.2
RAM Target (Q4 Quantized)~900MB~1.4GB
Time to First Token400ms1.5s - 45s

The intelligence gap is massive. A 73.4 on GPQA means the Qwen model actually solves logic problems instead of just guessing the next word based on a memorized template.

But the hardware reality on the Pi 5 exposes the trade-offs. The Qwen model is physically larger. At 1.4GB quantized, it takes a bigger bite out of your shared system memory. It also demands massive compute to process its internal reasoning step.

Performance Trade-offs: Latency and Infinite Reasoning Loops

Qwen wins on raw benchmarks, but Liquid still owns latency-first edge deployments.

If you build a voice assistant or a hardware tool like my Voice Keyboard, you need an answer immediately. Liquid hits a time-to-first-token of roughly 400ms on the Pi.

Qwen suffers from its own intelligence. Because it uses scaled RL for reasoning, it frequently falls into infinite planning loops. You will watch the console spit out think tags for 45 seconds while the Pi's CPU pins at 100%.

<think>
Wait, the user wants a simple explanation.
Let me consider analogies. Cars? Mail?
Mail is good. TCP is registered mail. UDP is a postcard.
Let me refine this.
Is a postcard accurate? Yes, no delivery guarantee.
Wait, let me rethink the delivery mechanism...
</think>

It does this just to answer a basic formatting question. Liquid lacks that internal planning phase. It never overthinks. It just gives you the best answer it has and moves on.

Verdict: Choosing the Best Local AI Model for Your Project

If you work with a strict RAM budget or need instant voice responses, stick with the 1.2B Liquid model. It remains the most efficient text engine for the Pi.

If you build an async agent that needs to look at camera feeds, plan multi-step actions, or parse messy JSON logs, swap to the 2B Qwen model. The latency hit is worth the intelligence gain.

There is also a wildcard here. Qwen released a 0.8B model in this series. It loads entirely into cache on some edge devices. I plan to test it specifically for single-purpose voice transcription pipelines next week.

The floor for local reasoning used to sit at 70B parameters. You needed a dedicated GPU rig just to get a model to plan its next move. Now we have 2B models doing native vision and complex planning on an 80-dollar single-board computer. The constraints are disappearing faster than the hardware changes.