When making queries via
HTTP Requests,
the OpenAI Python Package,
and the OpenAI NodeJS Package,
then the full ChatCompletion
is returned, as per the
OpenAI Standard.
Firstly, lets take a look at how we can format these returned ChatCompletion
instances more elegantly, so we can easily examine the contents.
As a recap, requests can be made directly to our
REST API as follows:
curl -X 'POST' \
'https://api.unify.ai/v0/chat/completions' \
-H "Authorization: Bearer $UNIFY_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "llama-3-8b-chat@fireworks-ai",
"messages": [{"role": "user", "content": "Say hello."}]
}'
The response is printed on a single line by default, which is not very human-readable:
{"model":"llama-3-8b-chat@fireworks-ai","created":1726589635,"id":"21f88b05-5aea-4e9d-8179-8959ac1c7c03","object":"chat.completion","usage":{"completion_tokens":26,"prompt_tokens":13,"total_tokens":39,"completion_tokens_details":null,"cost":7.8e-6},"choices":[{"finish_reason":"stop","index":0,"message":{"content":"Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?","role":"assistant","tool_calls":null,"function_call":null}}]}
The jq package can be used to render the returned
ChatCompletion
much more elegantly in the terminal:
curl -X 'POST' \
'https://api.unify.ai/v0/chat/completions' \
-H "Authorization: Bearer $UNIFY_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "llama-3-8b-chat@fireworks-ai",
"messages": [{"role": "user", "content": "Say hello."}]
}' | jq .
{
"model": "llama-3-8b-chat@fireworks-ai",
"created": 1726589999,
"id": "5dba39ee-7354-4f98-802d-8666b0668ba1",
"object": "chat.completion",
"usage": {
"completion_tokens": 25,
"prompt_tokens": 13,
"total_tokens": 38,
"completion_tokens_details": null,
"cost": 0.0000076
},
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "Hello! It's nice to meet you. Is there something I can help you with or would you like to chat?",
"role": "assistant",
"tool_calls": null,
"function_call": null
}
}
]
}
Again, as a recap, requests can also be made via the
OpenAI Python Package as follows:
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.unify.ai/v0/",
api_key=os.environ["UNIFY_KEY"]
)
response = client.chat.completions.create(
model="llama-3.1-405b-chat@fireworks-ai",
messages=[{"role": "user", "content": "Say hi."}]
)
print(response)
As with the REST API request, the response is printed on a single line by default:
ChatCompletion(id='d5f1c31d-f802-4c66-a0ea-74a7163bb680', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Hi! How are you today?', refusal=None, role='assistant', function_call=None, tool_calls=None))], created=1726590242, model='llama-3.1-405b-chat@fireworks-ai', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=8, prompt_tokens=13, total_tokens=21, completion_tokens_details=None, cost=6.3e-05))
The rich package can be used to print the chat
completion much more elegantly:
from rich import print
print(response)
ChatCompletion(
id='a1148d09-48d7-48ab-9001-740af89cc759',
choices=[
Choice(
finish_reason='stop',
index=0,
logprobs=None,
message=ChatCompletionMessage(
content="Hi! How's it going? Is there something I can help you
with or would you like to chat?",
refusal=None,
role='assistant',
function_call=None,
tool_calls=None
)
)
],
created=1726590370,
model='llama-3.1-405b-chat@fireworks-ai',
object='chat.completion',
service_tier=None,
system_fingerprint=None,
usage=CompletionUsage(
completion_tokens=23,
prompt_tokens=13,
total_tokens=36,
completion_tokens_details=None,
cost=0.000108
)
)
Finally, again as a recap, queries to the
OpenAI NodeJS Package can be made as follows:
const openai = require("openai");
const util = require("util")
const client = new openai.OpenAI({
baseURL: "https://api.unify.ai/v0",
apiKey: "API_KEY"
});
(async () => {
const response = await client.chat.completions.create({
model: "llama-3.1-405b-chat@fireworks-ai",
messages: [{ "role": "user", "content": "Say hi." }]
});
console.log(util.inspect(response, { depth: null, colors: true }));
})();
The response is printed as follows:
{
model: 'llama-3.1-405b-chat@fireworks-ai',
created: 1727237176,
id: '48e48dfc-e7af-4b10-b42c-6ea51b493f77',
object: 'chat.completion',
usage: {
completion_tokens: 23,
prompt_tokens: 13,
total_tokens: 36,
completion_tokens_details: null,
cost: 0.000108
},
choices: [
{
finish_reason: 'stop',
index: 0,
message: {
content: "Hi! How's it going? Is there something I can help you with or would you like to chat?",
role: 'assistant',
tool_calls: null,
function_call: null
}
}
]
}
Unify Python Client
In order to make things more user friendly, by default the
Unify Python Client
only returns the message content of the first choice among the returned choices
.
In Python, this is indexed like so: response.choices[0].message.content
import unify
client = unify.Unify("llama-3-8b-chat@fireworks-ai")
response = client.generate("hello world!")
print(response)
Hello World! It's nice to meet you! Is there something I can help you with, or would you like to chat?
The full ChatCompletion
can also easily be returned in the Python client, by setting
return_full_completion=True
, either in the constructor
(making it a Default Argument),
or in the generate
method, as below:
import unify
client = unify.Unify("llama-3-8b-chat@fireworks-ai", return_full_completion=True)
response = client.generate("hello world!")
print(response)
ChatCompletion(
id='09a17ba1-f1d1-4fc9-90ac-1502e3e57e03',
choices=[
Choice(
finish_reason='stop',
index=0,
logprobs=None,
message=ChatCompletionMessage(
content="Hello World! It's great to meet you! Is there something
I can help you with, or would you like to chat?",
refusal=None,
role='assistant',
function_call=None,
tool_calls=None
)
)
],
created=1726758685,
model='llama-3-8b-chat@fireworks-ai',
object='chat.completion',
service_tier=None,
system_fingerprint=None,
usage=CompletionUsage(
completion_tokens=27,
prompt_tokens=13,
total_tokens=40,
completion_tokens_details=None,
cost=8e-06
)
)
Again, we can make the ChatCompletion
output more concise by setting the mode
unify.set_repr_mode("concise")
. Aside from removing the None
fields, "concise"
mode also removes all fields apart from choices
:
ChatCompletion(
choices=[
Choice(
finish_reason='stop',
index=0,
message=ChatCompletionMessage(
content="Hello World! It's great to meet you! Is there something
I can help you with, or would you like to chat?",
role='assistant'
)
)
]
)
The reason we omit all other fields when visualizing ChatCompletion
instances in
"concise"
mode is because Unify as a platform is primarily built for evaluations.
From this perspective, the focus is on tracking the input-output behaviour of LLMs.
We already explained our definition of a prompt in the
Prompts section.
As a recap, our definition of a prompt is as follows:
A prompt is a json object containing all arguments in the OpenAI chat
completions request body which impact the output.
Similarly, when it comes to evaluating LLM responses, we’re only concerned with the
aspects in the ChatCompletion
returned which are are impacted by the input.
Looking through OpenAI’s
ChatCompletion description,
we can see that the only field which is affected by the input is the choices
field.
As such, everything else returned in the ChatCompletion
instance is irrelevant
from the perspective of evaluations. As before, even when "concise"
mode is set,
you can view the full ChatCompletion
instance with all meta data like so:
print(response.full_repr())
Broader Usage
Aside from being the default response type from the
chat/completions
endpoint, ChatCompletion
instances are also what are logged in the platform
(see Logging section), cached
(see Caching section),
and stored in datasets
(see Datasets section).
This section is mainly to give a quick introduction to the ChatCompletion
return type,
explain how to print this in a human-readable way when making queries via the various
options available, and explain some of the design decisions we’ve made for our own
Python client.
To learn more about what each field represents, you should consult
OpenAI’s documentation.
If anything is unclear, let us know on
discord! 👾