Fallbacks
Sometimes individual providers have outages, which can disrupt live workflows in production. To combat this, you can set a list of fallback providers, models, endpoints, or even entire fallback queries (if different models require unique system messages, temperature etc.). Therefore, if one provider goes down or fails to respond for some reason, the request will go to the next model on the list, and so on, until either the request succeeds, or the end of the list is reached.
Fallback Models, Providers, Endpoints and Regions
Lets assume you’re deploying claude-3-opus
for a production application, where
downtime would be detrimental for your users. You can set anthropic
as your default
provider, with aws-bedrock
as your fallback provider if Anthropic goes down or there
are rate limit issues, like so:
Of course, the same can be done via any of the
query methods
(HTTP Requests,
Unify Python Package,
OpenAI Python Package,
OpenAI NodeJS Package,
by simply specifying the model
argument in each case).
For example, in the
Unify Python Package
it would simply look like this:
The same logic can also be applied to different models with the same provider,
for example maybe you’re deploying gemini-1.5-pro
on vertex-ai
, but you hit
occasional rate limit issues, in which you want to fallback to gemini-1.5-flash
as
your fallback model, again on vertex-ai
. This can be specified like so:
Finally, you can also specify fallbacks for the entire endpoint. For example, maybe you don’t want to keep your users waiting, and so if the first attempt to a larger model fails, then you just want to give them a response quickly:
The Fallback logic can be composed arbitrarily across models, providers and endpoints.
For example, the following will attempt llama-3.1-405b-chat
for a variety of providers
before falling back to llama-3.1-70b-chat@groq
if they all fail:
As another example, the following will attempt gemini-1.5-pro
and gemini-1.5-flash
with vertex-ai
, then attempt llama-3.1-405b-chat
with together-ai
and
fireworks-ai
if they fail, and then finally attempt llama-3.1-70b-chat@groq
:
For enterprise users deploying on-prem,
you can also route across different regions.
For example, lets assume you are deploying gemini-1.5-pro
on us-east1
,
but sometimes hit rate limits. You can add a fallback to us-west1
as follows:
With the Python client, it would look as follows:
If a fallback region is specified, then fallback models, providers and endpoints cannot be specified as well. Fallback regions are only compatible with individual endpoints.
Fallback Queries
As mentioned above, fallbacks can also be specified for entire queries. This is useful in cases where aspects of the prompt need to be handled in a model-specific or provider-specific manner.
Taking the very first example again, lets assume you’re deploying claude-3-opus
for a
production application. You would like to use aws-bedrock
to make use of credits on
your own account, but you want to use anthropic
as a fallback. Your have your own API
key for bedrock (and so you’ll be passing "use_custom_keys": true
in the request),
but you don’t have your own API key for Anthropic. This fallback logic can be specified
as follows:
Fallback queries can also be combined with fallback models, providers, and endpoints.
For example, the fallback logic below will attempt claude-3-opus@aws-bedrock
with
custom API keys, then vertex-ai
(again with custom API keys, to use your own GCP
credits), before finally falling back to anthropic
if both fail:
Currently, fallbacks across queries is not support in the Python SDK, but if this would be useful, then feel free to hop on a call or ping us on discord! 👾