Model Routing
- Definition
- A system that dynamically selects which AI model should handle each request based on the query's complexity, cost constraints, latency requirements, or domain. Model routing lets applications use expensive frontier models only when needed and cheap efficient models for everything else.
- Why it matters
- Running GPT-4-class models on every request is like flying first class for a 30-minute commute — technically possible but economically irrational. Model routing is how companies cut inference costs 60-80% without sacrificing quality on the requests that matter. The pattern is simple: classify incoming requests by difficulty, route easy ones to cheap models, and save frontier models for hard problems. This is the infrastructure pattern that makes AI economically viable at scale. If your AI vendor is charging you frontier pricing for every request without routing, you are overpaying by at least 3x. Model routing is not optional for production AI — it is table stakes.
- In practice
- Martian's model router uses a trained classifier to predict which model will produce the best output for each request, claiming 40% cost reduction with equivalent quality. OpenRouter provides a unified API across 100+ models with automatic routing options. Vercel AI SDK includes model selection capabilities for multi-provider architectures. Cloudflare AI Gateway routes requests across providers with cost and latency optimization. LMSYS Chatbot Arena data is used to benchmark routing accuracy — the research shows that a well-tuned router can match frontier model quality on 70-80% of requests using models that cost 10-20x less. Enterprise deployments typically route 60-70% of traffic to efficient models and 30-40% to frontier models.
We cover infrastructure & compute every week.
Get the 5 AI stories that matter — free, every Friday.
Related terms
Inference cost
The expense of running an AI model in production, typically measured per million tokens. Inference costs have dropped 10-100x in the past two years, enabling new business models and use cases.
Inference economics
The study of costs, pricing models, and margin structures around running AI models in production, encompassing hardware costs, model efficiency, pricing strategies, and the competitive dynamics of the inference market.
Efficient model
A model designed to deliver strong performance at a fraction of the compute cost of frontier models, through architectural innovations, aggressive distillation, or better training data curation. Efficient models prioritize the performance-per-dollar ratio.
Frontier model
The most capable AI model available at any given time, representing the current state of the art. Frontier models push the boundaries of what AI can do and are typically the most expensive to train and run.
Know the terms. Know the moves.
Get the 5 AI stories that matter every Friday — free.
Free forever. No spam.