Multi-Model Routing

01

The problem

By default, OpenClaw sends every request through your primary model. Heartbeat pings, sub-agent tasks, calendar lookups, architecture decisions: all routed to the same expensive frontier model.

It's the equivalent of commissioning a Renaissance master to paint a fence. Technically capable. Absurdly expensive. Completely unnecessary.

48×/day

Heartbeats at €30/M

100+

Sub-agents at frontier cost

€940/mo

Wasted on trivial tasks

Not all tasks require the same intelligence. A heartbeat needs a 200ms "alive" response, not multi-step reasoning across a 200K-token context.

02

Model tiering

Instead of one model for everything, assign different models to different task types based on what each one actually needs. Three tiers are sufficient.

Frontier

Your hardest problems. Ambiguity, deep context, superior reasoning required.

Models: Opus 4.5 · GPT-5.2 · Gemini 3 ProTasks: Architecture, multi-file refactoring, novel problems

Mid-Tier

80% of daily work. Comparable quality at 10× less cost.

Models: DeepSeek R1 · Gemini 3 Flash · Sonnet 4.5Tasks: Code generation, research, sub-agents

Budget

These need a response, not reasoning. 60× cheaper. 6× faster.

Models: Flash-Lite · DeepSeek V3.2 · MiMo-V2Tasks: Heartbeats, health checks, simple classification

03

Model comparison

Every major model compared by cost, speed, and use case. Click column headers to sort. Filter by tier.

Model↑	Tier↑	Cost / 1M tokens↑	Speed↑	Best For
MiMo-V2-Flash Xiaomi	Budget	€0.40	320 t/s	HeartbeatsPing checks
Gemini 2.5 Flash-Lite Google	Budget	€0.50	300 t/s	HeartbeatsSimple tasks
DeepSeek V3.2 DeepSeek	Budget	€0.53	280 t/s	ClassificationSimple queries
GLM 4.7 Zhipu	Budget	€0.96	200 t/s	Coding200K context
Kimi K2 Thinking Moonshot	Mid-Tier	€2.15	150 t/s	ReasoningBudget option
DeepSeek R1 DeepSeek	Mid-Tier	€2.74	130 t/s	ReasoningSub-agents
Gemini 3 Flash Google	Mid-Tier	€3.50	250 t/s	Fast responsesMid-tier tasks
GPT-5 OpenAI	Frontier	€11.25	80 t/s	Frontier reasoningBest value
Gemini 3 Pro Google	Frontier	€14.00	70 t/s	Frontier1M context
GPT-5.2 OpenAI	Frontier	€15.75	65 t/s	Latest flagshipComplex tasks
Claude Sonnet 4.5 Anthropic	Frontier	€18.00	60 t/s	Premium codingAnalysis
Claude Opus 4.5 Anthropic	Frontier	€30.00	50 t/s	Complex synthesisArchitecture

Gemini 2.5 Flash-Lite at €0.50/M is 60× cheaper than Claude Opus 4.5 at €30.00/M And 6× faster. For a heartbeat, there is zero quality difference.

04

Routing, visualized

The left panel: every request funneled through one expensive model. The right: each task type matched to its most cost-effective model.

Without Routing

All Requests

Claude Opus 4.5

€30.00 / M tokens

Heartbeats

Sub-agents

Queries

With Routing

Intelligent Router

Flash-Lite

€0.50/M

Heartbeats

DeepSeek R1

€2.74/M

Sub-agents

Opus 4.5

€30.00/M

Complex

05

Calculate your savings

Enter your usage below. Start with a preset, fine-tune with the sliders, pick your models. Savings update in real time.

Usage

Heartbeats / day48

Sub-agent tasks / day100

Main queries / day50

Models

Primary

Heartbeat

Sub-agent

Before

€2496.60

/month

After

€1248.66

/month

Saved

€1247.94

50% reduction

Breakdown

Heartbeats€21.60 → €0.36 (−98%)

Sub-agents€1350.00 → €123.30 (−91%)

Queries€1125.00 → €1125.00 (0%)

06

Implementation

Two paths: manual configuration for full control, or OpenRouter's auto-router for zero setup. The manual approach is recommended for production.

The default config

Where most people start: a single model for everything.

~/.openclaw/openclaw.json

{
  "agents": {
    "defaults": {
      "model": "anthropic/claude-opus-4-5"
    }
  }
}

The optimized config

The key additions: a heartbeat block, a subagents block, and a fallback chain that spans providers.

~/.openclaw/openclaw.json

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "anthropic/claude-opus-4-5",
        "fallbacks": [
          "openai/gpt-5.2",
          "deepseek/deepseek-reasoner",
          "google/gemini-3-flash"
        ]
      },
      "heartbeat": {
        "every": "30m",
        "model": "google/gemini-2.5-flash-lite",
        "target": "last"
      },
      "subagents": {
        "model": "deepseek/deepseek-reasoner",
        "maxConcurrent": 1,
        "archiveAfterMinutes": 60
      },
      "contextTokens": 200000
    }
  }
}

The first fallback is GPT-5.2 (OpenAI), not Sonnet (Anthropic). If Anthropic hits rate limits, all their models slow down. A different provider keeps you running.

07

Config generator

Select your models and parameters. Copy the generated JSON directly into your config file.

Core Models

Primary

Fallback #1

Fallback #2

Task Models

Heartbeat

Sub-agent

Vision

Parameters

Heartbeat interval

Max concurrent sub-agents

Context window

Use different providers for primary and fallback. If Anthropic is rate-limited, falling back to another Anthropic model won't help.

~/.openclaw/openclaw.json

1{
2  500">"agents": {
3    500">"defaults": {
4      500">"model": {
5        500">"primary": "anthropic/claude-opus-4-5",
6        500">"fallbacks": [
7          "openai/gpt-5.2",
8          "deepseek/deepseek-reasoner",
9          "google/gemini-3-flash"
10        ]
11      },
12      500">"models": {
13        500">"anthropic/claude-opus-4-5": {
14          500">"alias": "opus"
15        },
16        500">"openai/gpt-5.2": {
17          500">"alias": "gpt52"
18        },
19        500">"google/gemini-3-flash": {
20          500">"alias": "flash"
21        },
22        500">"deepseek/deepseek-reasoner": {
23          500">"alias": "sub"
24        }
25      },
26      500">"heartbeat": {
27        500">"every": "30m",
28        500">"model": "google/gemini-2.5-flash-lite",
29        500">"target": "last"
30      },
31      500">"subagents": {
32        500">"model": "deepseek/deepseek-reasoner",
33        500">"maxConcurrent": 1,
34        500">"archiveAfterMinutes": 60
35      },
36      500">"imageModel": {
37        500">"primary": "google/gemini-3-flash",
38        500">"fallbacks": [
39          "openai/gpt-5.2"
40        ]
41      },
42      500">"contextTokens": 200000
43    }
44  }
45}

08

Runtime switching

The /model command switches models mid-session without editing config files:

/model              # Opens model picker
/model sonnet       # Switch to Sonnet
/model flash        # Switch to Gemini 3 Flash
/model ds           # Switch to DeepSeek
/model opus         # Back to Opus for complex work

The aliases come from the models block in your config. Stay on your primary model for complex work, drop to a budget model for quick questions, switch back.

09

Free tier traps

Free models seem appealing. They will cost you more than the €0.50/M alternative.

01

Aggressive rate limits

Hit them mid-task and your agent stops cold.

02

Unpredictable speed

Shared infrastructure. Fast sometimes, painfully slow at peak hours.

03

Zero guarantees

Free tiers disappear overnight without notice.

Budget paid models cost almost nothing but come with reliability guarantees. For an agent running 24/7, that reliability is worth the pennies.

The problem

Model tiering

Model comparison

Routing, visualized

Calculate your savings

Implementation

The default config

The optimized config

Config generator

Runtime switching

Free tier traps

Begin