Comparisons
When choosing which LLM to use, it’s often very useful to simply get a “vibe check”, and see which LLMs seem to respond best to the kinds of questions you’d like to ask.
Trying out LLMs one after the other can be tedious. We’ve therefore made it very easy to compare LLM outputs to the same question side-by-side, both via our browser-based chat interface, and the MultiLLM and MultiLLMAsync classes in our Python SDK.
Single LLM Clients
Before diving into the Multi-LLM clients,
it’s helpful to have a quick recap on how the UniLLM
clients,
Unify
and AsyncUnify
, can be used:
For optimal performance in handling multiple user requests simultaneously,
such as in a chatbot application, processing them asynchronously is recommended.
A minimal example using AsyncUnify
is given below:
More a more applied example, processing multiple requests in parallel can then be done as follows:
Functionality wise, the asynchronous and synchronous clients are identical.
Multi-LLM Clients
Both MultiLLM
and MultiLLMAsync
wrap AsyncUnify
instances under the hood,
such that the LLMs are queried in parallel. The distinction between MultiLLM
and
MultiLLMAsync
refers to whether the .generate()
method is also itself an
asynchronous function, which can be nested inside a broader outer program orchestrated
by asyncio.run
.
An interactive session with several LLM can be spun up in Python very quickly, as follows:
If you want to query several multi-llm clients in parallel,
then it’s best to use MultiLLMAsync
, as follows:
As explained in the Arguments page for
the Unify
client, setters can also be chained for multi-llm clients, like so: