Multi-model routing: smart agents, cheap execution

I spent the first month using Opus for everything. Every rename, every boilerplate file, every “add a console.log here” task went through the most expensive model available. It worked fine. It also cost way more than it needed to.

The problem isn’t that cheap models are bad. It’s that most AI coding setups treat every task the same. You type a prompt, the model runs, you get output. Whether you’re asking it to rename a variable or redesign your authentication system, it uses the same model at the same price.

Multi-model routing fixes this by matching task complexity to model capability. Simple tasks go to fast, cheap models. Complex tasks go to the best model you have. The result: your hard problems get full attention, and your easy problems stop burning through your budget.

The routing spectrum

Not all tasks need the same level of intelligence. Here’s roughly how I think about it:

Cheap model territory (MiniMax, Haiku, o4-mini):

Renaming variables and functions
Generating boilerplate (test files, config files, type definitions)
Formatting and linting fixes
Simple refactors with clear instructions
Writing commit messages
Mechanical code changes (“add error handling to every route”)

Mid-range model territory (Sonnet, GPT-5.2):

Writing first drafts of documentation
Code review and analysis
Research and summarization
Implementing features from detailed specs
Debugging with clear reproduction steps

Top model territory (Opus, GPT-5.2/5.3 Codex High):

Architectural decisions
Complex multi-file refactors
Debugging subtle issues without clear reproduction
Writing code that requires deep understanding of the codebase
Anything where getting it wrong means hours of cleanup

The exact boundaries depend on the models available to you and your tolerance for quality variation. But the principle holds: most of the tasks you do in a day don’t need your most capable model.

What this looks like in practice

I run a four-model setup. The main session uses Opus for direct conversation and complex work. Subagents get routed to cheaper models based on what they’re doing.

The routing looks like this:

Opus: Main session, architectural work, anything that requires judgment or understanding my full project context
Sonnet: Analysis tasks, code review, writing drafts, research summarization
MiniMax (or equivalent cheap model): Mechanical implementation, boilerplate generation, bulk file changes, anything with very explicit instructions
Haiku/mini: Fast lookups, formatting, simple transformations

The key insight: subagents doing implementation work from a detailed plan don’t need Opus. If the plan says “create a file at src/utils/validate.ts that exports a function validateEmail using this regex,” MiniMax handles that perfectly. Opus wrote the plan. MiniMax executes it.

The plan-then-execute pattern

This is where routing pays off the most. The workflow:

Plan with your best model. Use Opus (or whatever your top model is) to analyze the codebase, design the approach, and write a detailed execution plan. This is the thinking step, and it’s worth paying for.
Execute with your cheapest model. Hand the plan to subagents running on MiniMax or Haiku. Each subagent gets one piece of the plan with explicit instructions.
Review with a mid-range model. Run Sonnet on the output to check for issues. It catches bugs that MiniMax missed without needing Opus-level capability.

The plan is the critical artifact. A vague plan (“implement the feature”) sent to a cheap model produces garbage. A detailed plan (“create this file with this function that takes these parameters and returns this type, following the pattern in src/utils/format.ts”) produces good code from even the cheapest model.

I learned this the hard way. My first attempt at routing sent tasks to MiniMax with instructions like “follow the plan in docs/plan.md.” MiniMax would read the plan, miss half the context, and produce skeleton code with TODOs everywhere. The fix: write plans that contain everything the executing agent needs. No references to other files it needs to go read. No “follow the same pattern.” Explicit, complete instructions.

Where routing goes wrong

Over-routing simple things. If you spend more time deciding which model to use than the task takes to complete, you’re over-optimizing. For quick one-off questions in your main session, just use whatever model is loaded. The routing overhead isn’t worth it for a 30-second task.

Under-specifying tasks for cheap models. Cheap models are capable, but they need clear instructions. “Refactor the auth module” sent to MiniMax will produce something, but probably not what you wanted. “In src/auth/login.ts, extract the token validation logic (lines 45-78) into a new function called validateToken in src/auth/tokens.ts, with this signature: …” will produce exactly what you need.

Routing complex tasks to cheap models to save money. Some tasks genuinely need a capable model. Debugging a race condition, designing an API surface, understanding why a test fails intermittently. These are not the place to save money. Use your best model and move on.

Ignoring the review step. Cheap model output needs a review pass. Not because it’s always wrong, but because when it is wrong, it’s confidently wrong. A quick review with Sonnet catches things like misunderstood requirements, missing edge cases, or implementations that technically work but violate your project’s conventions.

Cost impact

I don’t have precise numbers because usage varies week to week. But the rough shape: before routing, everything ran on Opus. After routing, maybe 20-30% of tasks still use Opus, 20-30% use Sonnet, and the rest use MiniMax or equivalent. The total cost dropped significantly, and I don’t notice a quality difference on the tasks that moved to cheaper models.

The tasks that stayed on Opus are the ones where I’d notice if the quality dropped. Architecture, complex debugging, nuanced code review. Everything else moved down without incident.

Setting this up

The implementation depends on your tools. Some editors support model routing natively. Others need workarounds.

Claude Code: Supports model selection per subagent. When spawning a Task, you can specify which model it uses. Your main session stays on Opus while subagents run on Sonnet or Haiku.

Cursor: Lets you switch between models in settings. No per-task routing, but you can manually switch to a cheaper model for simple work and back to a capable model for complex tasks.

Multi-agent setups (OpenClaw, custom orchestrators): Route at the orchestration layer. The orchestrator decides which model handles each task based on the task description, file count, or explicit tags.

Manual routing: The simplest version. Use your main editor with a capable model for complex work. Use a separate terminal or cheaper tool for simple tasks. Not elegant, but effective.

If your setup doesn’t support automatic routing, start with manual routing. The habit of thinking “does this task need my best model?” before hitting enter is worth more than any automation.

Matching routing to skills

If you’re using Agent Skills, routing pairs naturally with skill design. A skill’s instructions can specify what level of capability it expects:

---
name: bulk-rename
description: Rename variables, functions, or files across the codebase. Use for mechanical renaming tasks.
---

This skill performs mechanical renaming. It does not require deep codebase
understanding. Route to a fast, cheap model when available.

...

---
name: architecture-review
description: Review and critique architectural decisions, dependency graphs, and system design.
---

This skill requires deep understanding of the codebase and design trade-offs.
Route to the most capable model available.

...

The routing hint in the skill instructions tells the agent (or the orchestrator) what capability level to expect. This isn’t enforced by the Agent Skills spec, but it’s a useful convention.

Start simple

If you’re not routing at all today, here’s how to start:

Identify your cheap tasks. For one week, notice which tasks you give to your AI agent that feel like overkill for a top-tier model. Boilerplate? Renaming? Formatting? Those are your candidates.
Try one cheaper model. Pick a cheap model (MiniMax, Haiku, o4-mini) and use it for those tasks. See if the output quality is acceptable.
Add the plan-then-execute pattern. For your next multi-file feature, write the plan with your best model, then hand each step to the cheap model. Compare the result to what you’d get doing everything on one model.
Formalize the routing. Once you know which tasks work on which models, set up whatever automation your tools support. Or just keep doing it manually. The awareness matters more than the tooling.

The goal isn’t to minimize cost. It’s to spend your AI budget where it has the most impact: on the tasks that are actually hard.