Having a back-and-forth conversation with an LLM is a very useful format for engagement, it is one of the primary reasons that ChatGPT became such an instant success.

In order to make back-and-forth chats simple in the platform, we’ve created a simple Chatbot class which wraps around either the Unify client or the MultiLLM client.

The asynchronous clients are also supported, but there is no benefit in using them over the synchronous versions in this case, as the ChatBot .run() method is interactive, and therefore cannot be combined with other .run() calls in parallel, with an asyncio.run call sitting on top.

Below, we show simple examples, and explain the potential use cases for each client + chatbot combination.

Unify Chatbot

This is the simplest chatbot, running synchronously with a single LLM:

import unify
client = unify.Unify("llama-3-8b-chat@fireworks-ai")
chatbot = unify.ChatBot(client)
chatbot.run()

A back and forth conversation is then triggered, with a single LLM, such as the following:

> What is the capital of Spain?
The capital of Spain is Madrid.
> Who is their most famous sports player?
Spain has produced many talented sports players, but one of the most famous and
successful is probably Andrés Iniesta

In order to create a more ChatGPT-esq experience, streaming can be turned on via stream=True in the unify.Unify constructor, such that the chatbot responses are streamed to the terminal.

MultiLLM Chatbot

In order to chat with several LLMs in parallel (similar to our chat interface), then you can create a multi-llm chatbot as follows:

import unify
client = unify.MultiLLM((
    "gpt-4o@openai",
    "claude-3-opus@anthropic",
    "llama-3-8b-chat@fireworks-ai"
))
chatbot = unify.ChatBot(client)
chatbot.run()

A back and forth conversation is then triggered, with multiple LLMs, such as the following:

> What is 2+2?
gpt-4o@openai:
2+2 equals 4.

claude-3-opus@anthropic:
The answer to the question "What is 2+2?" is 4.

The question involves simple addition. When you add two plus two, it equals four.

llama-3-8b-chat@fireworks-ai:
The answer to 2+2 is 4!

As explained in the Comparisons section, bear in mind that the MultiLLM still uses AsyncUnify instances under the hood, and each LLM is queried asynchronously (in parallel). Therefore, the chatbot response time will not depend on the number of LLMs being conversed with when wrapping MultiLLM.

Streaming is not supported by any multi-llm client, and so the responses are always returned as the final full string. The same is therefore also true for any multi-llm chatbots.

With a multi-llm chatbot, each LLM receives it’s own unique message history. This can be seen in the following example:

> Tell me a random number between 0 and 100, only respond with the number.
gpt-4o@openai:
42

claude-3-opus@anthropic:
27

llama-3-8b-chat@fireworks-ai:
43

> repeat this number.
gpt-4o@openai:
42

claude-3-opus@anthropic:
27

llama-3-8b-chat@fireworks-ai:
43

The ChatBot class can be helpful for quickly testing out models in an interactive manner inside your Python environment, even those which are not specifically chatbot applications.

If the message history is no longer needed, it can be cleared at any point in time by calling .clear_chat_history().

The chat can also be paused any time by typing pause in the response, and can be exited by typing quit.

For a more visually rich chatbot experience, we’d recommend using our chat interface in the console! 🤖