Guide

Multi-Model Routing

Your AI agent doesn't need a €30-per-million-token model to check a heartbeat. One configuration change cuts costs by 50–80% without sacrificing quality where it matters.

8 min readFebruary 2026
01

The problem

By default, OpenClaw sends every request through your primary model. Heartbeat pings, sub-agent tasks, calendar lookups, architecture decisions: all routed to the same expensive frontier model.

It's the equivalent of commissioning a Renaissance master to paint a fence. Technically capable. Absurdly expensive. Completely unnecessary.

48×/day
Heartbeats at €30/M
100+
Sub-agents at frontier cost
€940/mo
Wasted on trivial tasks
Not all tasks require the same intelligence. A heartbeat needs a 200ms "alive" response, not multi-step reasoning across a 200K-token context.
02

Model tiering

Instead of one model for everything, assign different models to different task types based on what each one actually needs. Three tiers are sufficient.

Frontier

Your hardest problems. Ambiguity, deep context, superior reasoning required.

Models: Opus 4.5 · GPT-5.2 · Gemini 3 ProTasks: Architecture, multi-file refactoring, novel problems
Mid-Tier

80% of daily work. Comparable quality at 10× less cost.

Models: DeepSeek R1 · Gemini 3 Flash · Sonnet 4.5Tasks: Code generation, research, sub-agents
Budget

These need a response, not reasoning. 60× cheaper. 6× faster.

Models: Flash-Lite · DeepSeek V3.2 · MiMo-V2Tasks: Heartbeats, health checks, simple classification
03

Model comparison

Every major model compared by cost, speed, and use case. Click column headers to sort. Filter by tier.

ModelTierCost / 1M tokensSpeedBest For
MiMo-V2-Flash
Xiaomi
Budget0.40320 t/s
HeartbeatsPing checks
Gemini 2.5 Flash-Lite
Google
Budget0.50300 t/s
HeartbeatsSimple tasks
DeepSeek V3.2
DeepSeek
Budget0.53280 t/s
ClassificationSimple queries
GLM 4.7
Zhipu
Budget0.96200 t/s
Coding200K context
Kimi K2 Thinking
Moonshot
Mid-Tier2.15150 t/s
ReasoningBudget option
DeepSeek R1
DeepSeek
Mid-Tier2.74130 t/s
ReasoningSub-agents
Gemini 3 Flash
Google
Mid-Tier3.50250 t/s
Fast responsesMid-tier tasks
GPT-5
OpenAI
Frontier11.2580 t/s
Frontier reasoningBest value
Gemini 3 Pro
Google
Frontier14.0070 t/s
Frontier1M context
GPT-5.2
OpenAI
Frontier15.7565 t/s
Latest flagshipComplex tasks
Claude Sonnet 4.5
Anthropic
Frontier18.0060 t/s
Premium codingAnalysis
Claude Opus 4.5
Anthropic
Frontier30.0050 t/s
Complex synthesisArchitecture
Gemini 2.5 Flash-Lite at €0.50/M is 60× cheaper than Claude Opus 4.5 at €30.00/M And 6× faster. For a heartbeat, there is zero quality difference.
04

Routing, visualized

The left panel: every request funneled through one expensive model. The right: each task type matched to its most cost-effective model.

Without Routing
All Requests
Claude Opus 4.5
€30.00 / M tokens
Heartbeats
Sub-agents
Queries
With Routing
Intelligent Router
Flash-Lite
€0.50/M
Heartbeats
DeepSeek R1
€2.74/M
Sub-agents
Opus 4.5
€30.00/M
Complex
05

Calculate your savings

Enter your usage below. Start with a preset, fine-tune with the sliders, pick your models. Savings update in real time.

Usage
48
100
50
Models
Before
€2496.60
/month
After
€1248.66
/month
Saved
€1247.94
50% reduction
Breakdown
Heartbeats21.60 → €0.36 (98%)
Sub-agents1350.00 → €123.30 (91%)
Queries1125.00 → €1125.00 (0%)
06

Implementation

Two paths: manual configuration for full control, or OpenRouter's auto-router for zero setup. The manual approach is recommended for production.

The default config

Where most people start: a single model for everything.

~/.openclaw/openclaw.json
{
  "agents": {
    "defaults": {
      "model": "anthropic/claude-opus-4-5"
    }
  }
}

The optimized config

The key additions: a heartbeat block, a subagents block, and a fallback chain that spans providers.

~/.openclaw/openclaw.json
{
  "agents": {
    "defaults": {
      "model": {
        "primary": "anthropic/claude-opus-4-5",
        "fallbacks": [
          "openai/gpt-5.2",
          "deepseek/deepseek-reasoner",
          "google/gemini-3-flash"
        ]
      },
      "heartbeat": {
        "every": "30m",
        "model": "google/gemini-2.5-flash-lite",
        "target": "last"
      },
      "subagents": {
        "model": "deepseek/deepseek-reasoner",
        "maxConcurrent": 1,
        "archiveAfterMinutes": 60
      },
      "contextTokens": 200000
    }
  }
}
The first fallback is GPT-5.2 (OpenAI), not Sonnet (Anthropic). If Anthropic hits rate limits, all their models slow down. A different provider keeps you running.
07

Config generator

Select your models and parameters. Copy the generated JSON directly into your config file.

Core Models
Task Models
Parameters

Use different providers for primary and fallback. If Anthropic is rate-limited, falling back to another Anthropic model won't help.

~/.openclaw/openclaw.json
1{
2 500">"agents": {
3 500">"defaults": {
4 500">"model": {
5 500">"primary": "anthropic/claude-opus-4-5",
6 500">"fallbacks": [
7 "openai/gpt-5.2",
8 "deepseek/deepseek-reasoner",
9 "google/gemini-3-flash"
10 ]
11 },
12 500">"models": {
13 500">"anthropic/claude-opus-4-5": {
14 500">"alias": "opus"
15 },
16 500">"openai/gpt-5.2": {
17 500">"alias": "gpt52"
18 },
19 500">"google/gemini-3-flash": {
20 500">"alias": "flash"
21 },
22 500">"deepseek/deepseek-reasoner": {
23 500">"alias": "sub"
24 }
25 },
26 500">"heartbeat": {
27 500">"every": "30m",
28 500">"model": "google/gemini-2.5-flash-lite",
29 500">"target": "last"
30 },
31 500">"subagents": {
32 500">"model": "deepseek/deepseek-reasoner",
33 500">"maxConcurrent": 1,
34 500">"archiveAfterMinutes": 60
35 },
36 500">"imageModel": {
37 500">"primary": "google/gemini-3-flash",
38 500">"fallbacks": [
39 "openai/gpt-5.2"
40 ]
41 },
42 500">"contextTokens": 200000
43 }
44 }
45}
08

Runtime switching

The /model command switches models mid-session without editing config files:

/model              # Opens model picker
/model sonnet       # Switch to Sonnet
/model flash        # Switch to Gemini 3 Flash
/model ds           # Switch to DeepSeek
/model opus         # Back to Opus for complex work

The aliases come from the models block in your config. Stay on your primary model for complex work, drop to a budget model for quick questions, switch back.

09

Free tier traps

Free models seem appealing. They will cost you more than the €0.50/M alternative.

01
Aggressive rate limits
Hit them mid-task and your agent stops cold.
02
Unpredictable speed
Shared infrastructure. Fast sometimes, painfully slow at peak hours.
03
Zero guarantees
Free tiers disappear overnight without notice.

Budget paid models cost almost nothing but come with reliability guarantees. For an agent running 24/7, that reliability is worth the pennies.

Begin

One config file. Three key changes. Route heartbeats to a budget model, sub-agents to mid-tier, add cross-provider fallbacks. Your complex work stays on frontier models. Everything else gets cheaper.