When choosing which LLM to use, it’s often very useful to simply get a “vibe check”,
and see which LLMs seem to respond best to the kinds of questions you’d like to ask.
Trying out LLMs one after the other can be tedious. We’ve therefore made it very easy to
compare LLM outputs to the same question side-by-side, both via our browser-based
chat interface,
and the MultiLLM and
MultiLLMAsync
classes in our Python SDK.
Single LLM Clients
Before diving into the Multi-LLM clients,
it’s helpful to have a quick recap on how the UniLLM
clients,
Unify
and AsyncUnify
, can be used:
import unify
client = unify.Unify("gpt-4o@openai")
print(client.generate("Hello, how it is going?"))
For optimal performance in handling multiple user requests simultaneously,
such as in a chatbot application, processing them asynchronously is recommended.
A minimal example using AsyncUnify
is given below:
import unify
import asyncio
async_client = unify.AsyncUnify("llama-3-8b-chat@fireworks-ai")
asyncio.run(async_client.generate("Hello Llama! Who was Isaac Newton?"))
More a more applied example,
processing multiple requests in parallel can then be done as follows:
import unify
import asyncio
clients = dict()
clients["gpt-4o@openai"] = unify.AsyncUnify("gpt-4o@openai")
clients["claude-3-opus@anthropic"] = unify.AsyncUnify("claude-3-opus@anthropic")
clients["llama-3-8b-chat@fireworks-ai"] = unify.AsyncUnify("llama-3-8b-chat@fireworks-ai")
async def generate_responses(user_message: str):
responses_ = dict()
for endpoint_, client in clients.items():
responses_[endpoint_] = await client.generate(user_message)
return responses_
responses = asyncio.run(generate_responses("Hello, how's it going?"))
for endpoint, response in responses.items():
print("endpoint: {}".format(endpoint))
print("response: {}\n".format(response))
Functionality wise, the asynchronous and synchronous clients are identical.
Multi-LLM Clients
Both MultiLLM
and MultiLLMAsync
wrap AsyncUnify
instances under the hood,
such that the LLMs are queried in parallel. The distinction between MultiLLM
and
MultiLLMAsync
refers to whether the .generate()
method is also itself an
asynchronous function, which can be nested inside a broader outer program orchestrated
by asyncio.run
.
An interactive session with several LLM can be spun up in Python very quickly, as follows:
import unify
endpoints = ("llama-3-8b-chat@together-ai", "gpt-4o@openai", "claude-3.5-sonnet@anthropic")
client = unify.MultiLLM(endpoints=endpoints)
responses = client.generate("Hello, how it is going?")
for endpoint, response in responses.items():
print("endpoint: {}".format(endpoint))
print("response: {}\n".format(response))
If you want to query several multi-llm clients in parallel,
then it’s best to use MultiLLMAsync
, as follows:
import unify
import asyncio
openai_endpoints = ("llama-3-8b-chat@together-ai", "gpt-4o@openai", "claude-3.5-sonnet@anthropic")
openai_client = unify.MultiLLMAsync(
endpoints=openai_endpoints,
system_message="This is a system message specifically optimized for OpenAI models."
)
anthropic_endpoints = ("llama-3-8b-chat@together-ai", "gpt-4o@openai", "claude-3.5-sonnet@anthropic")
anthropic_client = unify.MultiLLMAsync(
endpoints=anthropic_endpoints,
system_message="This is a system message specifically optimized for Anthropic models."
)
async def generate_responses(user_message: str):
openai_responses = openai_client.generate(user_message)
anthropic_responses = openai_client.generate(user_message)
return {"openai": openai_responses, "anthropic": anthropic_responses}
all_responses = asyncio.run(generate_responses("Hello, how's it going?"))
for provider, responses in all_responses.items():
print("provider: {}\n".format(provider))
for endpoint, response in responses.items():
print("endpoint: {}".format(endpoint))
print("response: {}\n".format(response))
As explained in the Arguments page for
the Unify
client, setters can also be chained for multi-llm clients, like so:
import unify
endpoints = (
"llama-3-8b-chat@together-ai",
"gpt-4o@openai",
"claude-3.5-sonnet@anthropic"
)
client = unify.MultiLLM(endpoints=endpoints)
client.add_endpoints(
["gpt-4@openai", "gpt-4-turbo@openai"]
).remove_endpoints(
"claude-3.5-sonnet@anthropic"
)
assert set(client.endpoints) == {
"llama-3-8b-chat@together-ai",
"gpt-4o@openai",
"gpt-4@openai",
"gpt-4-turbo@openai"
}