The problem
By default, OpenClaw sends every request through your primary model. Heartbeat pings, sub-agent tasks, calendar lookups, architecture decisions: all routed to the same expensive frontier model.
It's the equivalent of commissioning a Renaissance master to paint a fence. Technically capable. Absurdly expensive. Completely unnecessary.
Model tiering
Instead of one model for everything, assign different models to different task types based on what each one actually needs. Three tiers are sufficient.
Your hardest problems. Ambiguity, deep context, superior reasoning required.
80% of daily work. Comparable quality at 10× less cost.
These need a response, not reasoning. 60× cheaper. 6× faster.
Model comparison
Every major model compared by cost, speed, and use case. Click column headers to sort. Filter by tier.
| Model↑ | Tier↑ | Cost / 1M tokens↑ | Speed↑ | Best For | Relative |
|---|---|---|---|---|---|
MiMo-V2-Flash Xiaomi | Budget | €0.40 | 320 t/s | HeartbeatsPing checks | |
Gemini 2.5 Flash-Lite Google | Budget | €0.50 | 300 t/s | HeartbeatsSimple tasks | |
DeepSeek V3.2 DeepSeek | Budget | €0.53 | 280 t/s | ClassificationSimple queries | |
GLM 4.7 Zhipu | Budget | €0.96 | 200 t/s | Coding200K context | |
Kimi K2 Thinking Moonshot | Mid-Tier | €2.15 | 150 t/s | ReasoningBudget option | |
DeepSeek R1 DeepSeek | Mid-Tier | €2.74 | 130 t/s | ReasoningSub-agents | |
Gemini 3 Flash Google | Mid-Tier | €3.50 | 250 t/s | Fast responsesMid-tier tasks | |
GPT-5 OpenAI | Frontier | €11.25 | 80 t/s | Frontier reasoningBest value | |
Gemini 3 Pro Google | Frontier | €14.00 | 70 t/s | Frontier1M context | |
GPT-5.2 OpenAI | Frontier | €15.75 | 65 t/s | Latest flagshipComplex tasks | |
Claude Sonnet 4.5 Anthropic | Frontier | €18.00 | 60 t/s | Premium codingAnalysis | |
Claude Opus 4.5 Anthropic | Frontier | €30.00 | 50 t/s | Complex synthesisArchitecture |
Routing, visualized
The left panel: every request funneled through one expensive model. The right: each task type matched to its most cost-effective model.
Calculate your savings
Enter your usage below. Start with a preset, fine-tune with the sliders, pick your models. Savings update in real time.
Implementation
Two paths: manual configuration for full control, or OpenRouter's auto-router for zero setup. The manual approach is recommended for production.
The default config
Where most people start: a single model for everything.
{
"agents": {
"defaults": {
"model": "anthropic/claude-opus-4-5"
}
}
}The optimized config
The key additions: a heartbeat block, a subagents block, and a fallback chain that spans providers.
{
"agents": {
"defaults": {
"model": {
"primary": "anthropic/claude-opus-4-5",
"fallbacks": [
"openai/gpt-5.2",
"deepseek/deepseek-reasoner",
"google/gemini-3-flash"
]
},
"heartbeat": {
"every": "30m",
"model": "google/gemini-2.5-flash-lite",
"target": "last"
},
"subagents": {
"model": "deepseek/deepseek-reasoner",
"maxConcurrent": 1,
"archiveAfterMinutes": 60
},
"contextTokens": 200000
}
}
}Config generator
Select your models and parameters. Copy the generated JSON directly into your config file.
Use different providers for primary and fallback. If Anthropic is rate-limited, falling back to another Anthropic model won't help.
1{2 500">"agents": {3 500">"defaults": {4 500">"model": {5 500">"primary": "anthropic/claude-opus-4-5",6 500">"fallbacks": [7 "openai/gpt-5.2",8 "deepseek/deepseek-reasoner",9 "google/gemini-3-flash"10 ]11 },12 500">"models": {13 500">"anthropic/claude-opus-4-5": {14 500">"alias": "opus"15 },16 500">"openai/gpt-5.2": {17 500">"alias": "gpt52"18 },19 500">"google/gemini-3-flash": {20 500">"alias": "flash"21 },22 500">"deepseek/deepseek-reasoner": {23 500">"alias": "sub"24 }25 },26 500">"heartbeat": {27 500">"every": "30m",28 500">"model": "google/gemini-2.5-flash-lite",29 500">"target": "last"30 },31 500">"subagents": {32 500">"model": "deepseek/deepseek-reasoner",33 500">"maxConcurrent": 1,34 500">"archiveAfterMinutes": 6035 },36 500">"imageModel": {37 500">"primary": "google/gemini-3-flash",38 500">"fallbacks": [39 "openai/gpt-5.2"40 ]41 },42 500">"contextTokens": 20000043 }44 }45}Runtime switching
The /model command switches models mid-session without editing config files:
/model # Opens model picker
/model sonnet # Switch to Sonnet
/model flash # Switch to Gemini 3 Flash
/model ds # Switch to DeepSeek
/model opus # Back to Opus for complex workThe aliases come from the models block in your config. Stay on your primary model for complex work, drop to a budget model for quick questions, switch back.
Free tier traps
Free models seem appealing. They will cost you more than the €0.50/M alternative.
Budget paid models cost almost nothing but come with reliability guarantees. For an agent running 24/7, that reliability is worth the pennies.
Begin
One config file. Three key changes. Route heartbeats to a budget model, sub-agents to mid-tier, add cross-provider fallbacks. Your complex work stays on frontier models. Everything else gets cheaper.