When choosing which LLM to use, it’s often very useful to simply get a “vibe check”, and see which LLMs seem to respond best to the kinds of questions you’d like to ask.

Trying out LLMs one after the other can be tedious. We’ve therefore made it very easy to compare LLM outputs to the same question side-by-side, both via our browser-based chat interface, and the MultiLLM and MultiLLMAsync classes in our Python SDK.

Single LLM Clients

Before diving into the Multi-LLM clients, it’s helpful to have a quick recap on how the UniLLM clients, Unify and AsyncUnify, can be used:

import unify
client = unify.Unify("gpt-4o@openai")
print(client.generate("Hello, how it is going?"))

For optimal performance in handling multiple user requests simultaneously, such as in a chatbot application, processing them asynchronously is recommended. A minimal example using AsyncUnify is given below:

import unify
import asyncio
async_client = unify.AsyncUnify("llama-3-8b-chat@fireworks-ai")
asyncio.run(async_client.generate("Hello Llama! Who was Isaac Newton?"))

More a more applied example, processing multiple requests in parallel can then be done as follows:

import unify
import asyncio
clients = dict()
clients["gpt-4o@openai"] = unify.AsyncUnify("gpt-4o@openai")
clients["claude-3-opus@anthropic"] = unify.AsyncUnify("claude-3-opus@anthropic")
clients["llama-3-8b-chat@fireworks-ai"] = unify.AsyncUnify("llama-3-8b-chat@fireworks-ai")


async def generate_responses(user_message: str):
   responses_ = dict()
   for endpoint_, client in clients.items():
       responses_[endpoint_] = await client.generate(user_message)
   return responses_

responses = asyncio.run(generate_responses("Hello, how's it going?"))
for endpoint, response in responses.items():
   print("endpoint: {}".format(endpoint))
   print("response: {}\n".format(response))

Functionality wise, the asynchronous and synchronous clients are identical.

Multi-LLM Clients

Both MultiLLM and MultiLLMAsync wrap AsyncUnify instances under the hood, such that the LLMs are queried in parallel. The distinction between MultiLLM and MultiLLMAsync refers to whether the .generate() method is also itself an asynchronous function, which can be nested inside a broader outer program orchestrated by asyncio.run.

An interactive session with several LLM can be spun up in Python very quickly, as follows:

import unify
endpoints = ("llama-3-8b-chat@together-ai", "gpt-4o@openai", "claude-3.5-sonnet@anthropic")
client = unify.MultiLLM(endpoints=endpoints)
responses = client.generate("Hello, how it is going?")
for endpoint, response in responses.items():
    print("endpoint: {}".format(endpoint))
    print("response: {}\n".format(response))

If you want to query several multi-llm clients in parallel, then it’s best to use MultiLLMAsync, as follows:

import unify
import asyncio

openai_endpoints = ("llama-3-8b-chat@together-ai", "gpt-4o@openai", "claude-3.5-sonnet@anthropic")
openai_client = unify.MultiLLMAsync(
    endpoints=openai_endpoints,
    system_message="This is a system message specifically optimized for OpenAI models."
)
anthropic_endpoints = ("llama-3-8b-chat@together-ai", "gpt-4o@openai", "claude-3.5-sonnet@anthropic")
anthropic_client = unify.MultiLLMAsync(
    endpoints=anthropic_endpoints,
    system_message="This is a system message specifically optimized for Anthropic models."
)

async def generate_responses(user_message: str):
    openai_responses = openai_client.generate(user_message)
    anthropic_responses = openai_client.generate(user_message)
    return {"openai": openai_responses, "anthropic": anthropic_responses}


all_responses = asyncio.run(generate_responses("Hello, how's it going?"))
for provider, responses in all_responses.items():
    print("provider: {}\n".format(provider))
    for endpoint, response in responses.items():
        print("endpoint: {}".format(endpoint))
        print("response: {}\n".format(response))

As explained in the Arguments page for the Unify client, setters can also be chained for multi-llm clients, like so:

import unify
endpoints = (
    "llama-3-8b-chat@together-ai",
    "gpt-4o@openai",
    "claude-3.5-sonnet@anthropic"
)
client = unify.MultiLLM(endpoints=endpoints)
client.add_endpoints(
    ["gpt-4@openai", "gpt-4-turbo@openai"]
).remove_endpoints(
    "claude-3.5-sonnet@anthropic"
)
assert set(client.endpoints) == {
    "llama-3-8b-chat@together-ai",
    "gpt-4o@openai",
    "gpt-4@openai",
    "gpt-4-turbo@openai"
}