Routing
Overview
LLM routing allows you to be flexible about which model, provider and endpoint handles each prompt. Flexibility can be advantageous for several reasons:
- Small models are (in general) faster and cheaper, whereas bigger models are more capable.
- Tasks often exhibit a range of difficulties, suitable for varying LLM capability
- Different providers have different latencies, and these change over time.
- New models come out every week, each having different strengths and weaknesses.
LLM routing provides:
- Faster and cheaper responses when a smaller model is capable of answering
- Continuous improvement: ‘riding the wave’ of new model releases
- Ability to maximise throughput or minimise latency based on live runtime statistics
- Reliability via fallbacks, if providers go down or latency limits are hit