stream=True
in the unify.Unify
constructor, such that the chatbot responses are
streamed to the terminal.
MultiLLM
still uses AsyncUnify
instances under the hood,
and each LLM is queried asynchronously (in parallel).
Therefore, the chatbot response time will not depend on the number of LLMs being
conversed with when wrapping MultiLLM
.
Streaming is not supported by any multi-llm client, and so the responses are always
returned as the final full string.
The same is therefore also true for any multi-llm chatbots.
With a multi-llm chatbot, each LLM receives it’s own unique message history.
This can be seen in the following example:
ChatBot
class can be helpful for quickly testing out models in an interactive
manner inside your Python environment,
even those which are not specifically chatbot applications.
If the message history is no longer needed,
it can be cleared at any point in time by calling .clear_chat_history()
.
The chat can also be paused any time by typing pause
in the response,
and can be exited by typing quit
.
For a more visually rich chatbot experience,
we’d recommend using our chat interface
in the console! 🤖