Sometimes individual providers have outages, which can disrupt live workflows in production. To combat this, you can set a list of fallback providers, models, endpoints, or even entire fallback queries (if different models require unique system messages, temperature etc.). Therefore, if one provider goes down or fails to respond for some reason, the request will go to the next model on the list, and so on, until either the request succeeds, or the end of the list is reached.

Fallback Models, Providers, Endpoints and Regions

Lets assume you’re deploying claude-3-opus for a production application, where downtime would be detrimental for your users. You can set anthropic as your default provider, with aws-bedrock as your fallback provider if Anthropic goes down or there are rate limit issues, like so:

curl -X 'POST' \
    'https://api.unify.ai/v0/chat/completions' \
    -H "Authorization: Bearer $UNIFY_KEY" \
    -H 'Content-Type: application/json' \
    -d '{
        "model": "claude-3-opus@anthropic->aws-bedrock",
        "messages": [{"role": "user", "content": "Hello."}]
    }'

Of course, the same can be done via any of the query methods (HTTP Requests, Unify Python Package, OpenAI Python Package, OpenAI NodeJS Package, by simply specifying the model argument in each case). For example, in the Unify Python Package it would simply look like this:

import unify
client = unify.Unify("claude-3-opus@anthropic->aws-bedrock")
response = client.generate("Hello.")

The same logic can also be applied to different models with the same provider, for example maybe you’re deploying gemini-1.5-pro on vertex-ai, but you hit occasional rate limit issues, in which you want to fallback to gemini-1.5-flash as your fallback model, again on vertex-ai. This can be specified like so:

import unify
client = unify.Unify("gemini-1.5-pro->gemini-1.5-flash@vertex-ai")
response = client.generate("Hello.")

Finally, you can also specify fallbacks for the entire endpoint. For example, maybe you don’t want to keep your users waiting, and so if the first attempt to a larger model fails, then you just want to give them a response quickly:

import unify
client = unify.Unify("llama-3.1-405b-chat@together-ai->llama-3.1-70b-chat@groq")
response = client.generate("Hello.")

The Fallback logic can be composed arbitrarily across models, providers and endpoints. For example, the following will attempt llama-3.1-405b-chat for a variety of providers before falling back to llama-3.1-70b-chat@groq if they all fail:

import unify
client = unify.Unify("llama-3.1-405b-chat@together-ai->fireworks-ai->vertex-ai->llama-3.1-70b-chat@groq")
response = client.generate("Hello.")

As another example, the following will attempt gemini-1.5-pro and gemini-1.5-flash with vertex-ai, then attempt llama-3.1-405b-chat with together-ai and fireworks-ai if they fail, and then finally attempt llama-3.1-70b-chat@groq:

import unify
client = unify.Unify("gemini-1.5-pro->gemini-1.5-flash@vertex-ai->llama-3.1-405b-chat@together-ai->fireworks-ai->vertex-ai->llama-3.1-70b-chat@groq")
response = client.generate("Hello.")

For enterprise users deploying on-prem, you can also route across different regions. For example, lets assume you are deploying gemini-1.5-pro on us-east1, but sometimes hit rate limits. You can add a fallback to us-west1 as follows:

curl -X 'POST' \
    'https://api.unify.ai/v0/chat/completions' \
    -H "Authorization: Bearer $UNIFY_KEY" \
    -H 'Content-Type: application/json' \
    -d '{
        "model": "gemini-1.5-pro@vertex-ai",
        "messages": [{"role": "user", "content": "Hello."}],
        "region": "us-east1->us-west1"
    }'

With the Python client, it would look as follows:

import unify
client = unify.Unify("gemini-1.5-pro@vertex-ai", region="us-east1->us-west1")
response = client.generate("Hello.")

If a fallback region is specified, then fallback models, providers and endpoints cannot be specified as well. Fallback regions are only compatible with individual endpoints.

Fallback Queries

As mentioned above, fallbacks can also be specified for entire queries. This is useful in cases where aspects of the prompt need to be handled in a model-specific or provider-specific manner.

Taking the very first example again, lets assume you’re deploying claude-3-opus for a production application. You would like to use aws-bedrock to make use of credits on your own account, but you want to use anthropic as a fallback. Your have your own API key for bedrock (and so you’ll be passing "use_custom_keys": true in the request), but you don’t have your own API key for Anthropic. This fallback logic can be specified as follows:

curl -X 'POST' \
    'https://api.unify.ai/v0/chat/completions' \
    -H "Authorization: Bearer $UNIFY_KEY" \
    -H 'Content-Type: application/json' \
    -d '[
          {
            "model": "claude-3-opus@aws-bedrock",
            "messages": [{"role": "user", "content": "Hello."}],
            "use_custom_keys": true
          },
          {
            "model": "claude-3-opus@anthropic",
            "messages": [{"role": "user", "content": "Hello."}]
          }
    ]'

Fallback queries can also be combined with fallback models, providers, and endpoints. For example, the fallback logic below will attempt claude-3-opus@aws-bedrock with custom API keys, then vertex-ai (again with custom API keys, to use your own GCP credits), before finally falling back to anthropic if both fail:

curl -X 'POST' \
    'https://api.unify.ai/v0/chat/completions' \
    -H "Authorization: Bearer $UNIFY_KEY" \
    -H 'Content-Type: application/json' \
    -d '[
          {
            "model": "claude-3-opus@aws-bedrock->vertex-ai",
            "messages": [{"role": "user", "content": "Hello."}],
            "use_custom_keys": true
          },
          {
            "model": "claude-3-opus@anthropic",
            "messages": [{"role": "user", "content": "Hello."}]
          }
    ]'

Currently, fallbacks across queries is not support in the Python SDK, but if this would be useful, then feel free to hop on a call or ping us on discord! 👾