Fallbacks
Sometimes individual providers have outages, which can disrupt live workflows in production. To combat this, you can set a list of fallback providers, models, endpoints, or even entire fallback queries (if different models require unique system messages, temperature etc.).
If one provider goes down or fails to respond for some reason, the request will go to the next model on the list, and so on, until either the request succeeds or the end of the list is reached.
Providers
Lets assume you’re deploying claude-3-opus
for a production application, where
downtime would be detrimental for your users. You can set anthropic
as your default
provider, with aws-bedrock
as your fallback provider if Anthropic goes down or there
are rate limit issues, like so:
In Python:
Models
The same logic can also be applied to different models with the same provider,
for example maybe you’re deploying gemini-1.5-pro
on vertex-ai
, but you hit
occasional rate limit issues, in which you want to fallback to gemini-1.5-flash
as
your fallback model, again on vertex-ai
. This can be specified like so:
Endpoints
Finally, you can also specify fallbacks for the entire endpoint. For example, maybe you don’t want to keep your users waiting, and so if the first attempt to a larger model fails, then you just want to give them a response quickly:
Compositions
The Fallback logic can be composed arbitrarily across models, providers and endpoints.
For example, the following will attempt llama-3.1-405b-chat
for a variety of providers
before falling back to llama-3.1-70b-chat@groq
if they all fail:
As another example, the following will attempt gemini-1.5-pro
and gemini-1.5-flash
with vertex-ai
, then attempt llama-3.1-405b-chat
with together-ai
and
fireworks-ai
if they fail, and then finally attempt llama-3.1-70b-chat@groq
:
Regions
For enterprise users deploying on-prem,
you can also route across different regions.
For example, lets assume you are deploying gemini-1.5-pro
on us-east1
,
but sometimes hit rate limits. You can add a fallback to us-west1
as follows:
In Python:
If a fallback region is specified, then fallback models, providers and endpoints cannot be specified as well. Fallback regions are only compatible with individual endpoints.
Queries
As mentioned above, fallbacks can also be specified for entire queries. This is useful in cases where aspects of the prompt need to be handled in a model-specific or provider-specific manner.
Taking the very first example again, lets assume you’re deploying claude-3-opus
for a
production application. You would like to use aws-bedrock
to make use of credits on
your own account, but you want to use anthropic
as a fallback. Your have your own API
key for bedrock (and so you’ll be passing "use_custom_keys": true
in the request),
but you don’t have your own API key for Anthropic. This fallback logic can be specified
as follows:
Fallback queries can also be combined with fallback models, providers, and endpoints.
For example, the fallback logic below will attempt claude-3-opus@aws-bedrock
with
custom API keys, then vertex-ai
(again with custom API keys, to use your own GCP
credits), before finally falling back to anthropic
(via your Unify API key) if both fail:
Currently, fallback queries are not support in the Python SDK, but if this would be useful, then feel free to ping us on discord! 👾