Providers
Lets assume you’re deployingclaude-3-opus for a production application, where
downtime would be detrimental for your users. You can set anthropic as your default
provider, with aws-bedrock as your fallback provider if Anthropic goes down or there
are rate limit issues, like so:
Models
The same logic can also be applied to different models with the same provider, for example maybe you’re deployinggemini-1.5-pro on vertex-ai, but you hit
occasional rate limit issues, in which you want to fallback to gemini-1.5-flash as
your fallback model, again on vertex-ai. This can be specified like so:
Endpoints
Finally, you can also specify fallbacks for the entire endpoint. For example, maybe you don’t want to keep your users waiting, and so if the first attempt to a larger model fails, then you just want to give them a response quickly:Compositions
The Fallback logic can be composed arbitrarily across models, providers and endpoints. For example, the following will attemptllama-3.1-405b-chat for a variety of providers
before falling back to llama-3.1-70b-chat@groq if they all fail:
gemini-1.5-pro and gemini-1.5-flash
with vertex-ai, then attempt llama-3.1-405b-chat with together-ai and
fireworks-ai if they fail, and then finally attempt llama-3.1-70b-chat@groq:
Regions
For enterprise users deploying on-prem, you can also route across different regions. For example, lets assume you are deployinggemini-1.5-pro on us-east1,
but sometimes hit rate limits. You can add a fallback to us-west1 as follows:
Queries
As mentioned above, fallbacks can also be specified for entire queries. This is useful in cases where aspects of the prompt need to be handled in a model-specific or provider-specific manner. Taking the very first example again, lets assume you’re deployingclaude-3-opus for a
production application. You would like to use aws-bedrock to make use of credits on
your own account, but you want to use anthropic as a fallback. Your have your own API
key for bedrock (and so you’ll be passing "use_custom_keys": true in the request),
but you don’t have your own API key for Anthropic. This fallback logic can be specified
as follows:
claude-3-opus@aws-bedrock with
custom API keys, then vertex-ai (again with custom API keys, to use your own GCP
credits), before finally falling back to anthropic (via your Unify API key) if both fail:
