When making queries via HTTP Requests, the OpenAI Python Package, and the OpenAI NodeJS Package, then the full ChatCompletion is returned, as per the OpenAI Standard.

Firstly, lets take a look at how we can format these returned ChatCompletion instances more elegantly, so we can easily examine the contents.

Formatting

As a recap, requests can be made directly to our REST API as follows:

curl -X 'POST' \
    'https://api.unify.ai/v0/chat/completions' \
    -H "Authorization: Bearer $UNIFY_KEY" \
    -H 'Content-Type: application/json' \
    -d '{
    "model": "llama-3-8b-chat@fireworks-ai",
        "messages": [{"role": "user", "content": "Say hello."}]
    }'

The response is printed on a single line by default, which is not very human-readable:

{"model":"llama-3-8b-chat@fireworks-ai","created":1726589635,"id":"21f88b05-5aea-4e9d-8179-8959ac1c7c03","object":"chat.completion","usage":{"completion_tokens":26,"prompt_tokens":13,"total_tokens":39,"completion_tokens_details":null,"cost":7.8e-6},"choices":[{"finish_reason":"stop","index":0,"message":{"content":"Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?","role":"assistant","tool_calls":null,"function_call":null}}]}

The jq package can be used to render the returned ChatCompletion much more elegantly in the terminal:

curl -X 'POST' \
    'https://api.unify.ai/v0/chat/completions' \
    -H "Authorization: Bearer $UNIFY_KEY" \
    -H 'Content-Type: application/json' \
    -d '{
    "model": "llama-3-8b-chat@fireworks-ai",
        "messages": [{"role": "user", "content": "Say hello."}]
    }' | jq .
{
  "model": "llama-3-8b-chat@fireworks-ai",
  "created": 1726589999,
  "id": "5dba39ee-7354-4f98-802d-8666b0668ba1",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 25,
    "prompt_tokens": 13,
    "total_tokens": 38,
    "completion_tokens_details": null,
    "cost": 0.0000076
  },
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "Hello! It's nice to meet you. Is there something I can help you with or would you like to chat?",
        "role": "assistant",
        "tool_calls": null,
        "function_call": null
      }
    }
  ]
}

Again, as a recap, requests can also be made via the OpenAI Python Package as follows:

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.unify.ai/v0/",
    api_key=os.environ["UNIFY_KEY"]
)

response = client.chat.completions.create(
    model="llama-3.1-405b-chat@fireworks-ai",
    messages=[{"role": "user", "content": "Say hi."}]
)
print(response)

As with the REST API request, the response is printed on a single line by default:

ChatCompletion(id='d5f1c31d-f802-4c66-a0ea-74a7163bb680', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Hi! How are you today?', refusal=None, role='assistant', function_call=None, tool_calls=None))], created=1726590242, model='llama-3.1-405b-chat@fireworks-ai', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=8, prompt_tokens=13, total_tokens=21, completion_tokens_details=None, cost=6.3e-05))

The rich package can be used to print the chat completion much more elegantly:

from rich import print
print(response)
ChatCompletion(
    id='a1148d09-48d7-48ab-9001-740af89cc759',
    choices=[
        Choice(
            finish_reason='stop',
            index=0,
            logprobs=None,
            message=ChatCompletionMessage(
                content="Hi! How's it going? Is there something I can help you
with or would you like to chat?",
                refusal=None,
                role='assistant',
                function_call=None,
                tool_calls=None
            )
        )
    ],
    created=1726590370,
    model='llama-3.1-405b-chat@fireworks-ai',
    object='chat.completion',
    service_tier=None,
    system_fingerprint=None,
    usage=CompletionUsage(
        completion_tokens=23,
        prompt_tokens=13,
        total_tokens=36,
        completion_tokens_details=None,
        cost=0.000108
    )
)

Finally, again as a recap, queries to the OpenAI NodeJS Package can be made as follows:

const openai = require("openai");
const util = require("util")

const client = new openai.OpenAI({
    baseURL: "https://api.unify.ai/v0",
    apiKey: "API_KEY"
});

(async () => {
    const response = await client.chat.completions.create({
        model: "llama-3.1-405b-chat@fireworks-ai",
        messages: [{ "role": "user", "content": "Say hi." }]
    });
    console.log(util.inspect(response, { depth: null, colors: true }));
})();

The response is printed as follows:

{
  model: 'llama-3.1-405b-chat@fireworks-ai',
  created: 1727237176,
  id: '48e48dfc-e7af-4b10-b42c-6ea51b493f77',
  object: 'chat.completion',
  usage: {
    completion_tokens: 23,
    prompt_tokens: 13,
    total_tokens: 36,
    completion_tokens_details: null,
    cost: 0.000108
  },
  choices: [
    {
      finish_reason: 'stop',
      index: 0,
      message: {
        content: "Hi! How's it going? Is there something I can help you with or would you like to chat?",
        role: 'assistant',
        tool_calls: null,
        function_call: null
      }
    }
  ]
}

Unify Python Client

In order to make things more user friendly, by default the Unify Python Client only returns the message content of the first choice among the returned choices. In Python, this is indexed like so: response.choices[0].message.content

import unify
client = unify.Unify("llama-3-8b-chat@fireworks-ai")
response = client.generate("hello world!")
print(response)
Hello World! It's nice to meet you! Is there something I can help you with, or would you like to chat?

The full ChatCompletion can also easily be returned in the Python client, by setting return_full_completion=True, either in the constructor (making it a Default Argument), or in the generate method, as below:

import unify
client = unify.Unify("llama-3-8b-chat@fireworks-ai", return_full_completion=True)
response = client.generate("hello world!")
print(response)
ChatCompletion(
    id='09a17ba1-f1d1-4fc9-90ac-1502e3e57e03',
    choices=[
        Choice(
            finish_reason='stop',
            index=0,
            logprobs=None,
            message=ChatCompletionMessage(
                content="Hello World! It's great to meet you! Is there something
I can help you with, or would you like to chat?",
                refusal=None,
                role='assistant',
                function_call=None,
                tool_calls=None
            )
        )
    ],
    created=1726758685,
    model='llama-3-8b-chat@fireworks-ai',
    object='chat.completion',
    service_tier=None,
    system_fingerprint=None,
    usage=CompletionUsage(
        completion_tokens=27,
        prompt_tokens=13,
        total_tokens=40,
        completion_tokens_details=None,
        cost=8e-06
    )
)

Again, we can make the ChatCompletion output more concise by setting the mode unify.set_repr_mode("concise"). Aside from removing the None fields, "concise" mode also removes all fields apart from choices:

ChatCompletion(
    choices=[
        Choice(
            finish_reason='stop',
            index=0,
            message=ChatCompletionMessage(
                content="Hello World! It's great to meet you! Is there something
I can help you with, or would you like to chat?",
                role='assistant'
            )
        )
    ]
)

The reason we omit all other fields when visualizing ChatCompletion instances in "concise" mode is because Unify as a platform is primarily built for evaluations. From this perspective, the focus is on tracking the input-output behaviour of LLMs. We already explained our definition of a prompt in the Prompts section. As a recap, our definition of a prompt is as follows:

A prompt is a json object containing all arguments in the OpenAI chat completions request body which impact the output.

Similarly, when it comes to evaluating LLM responses, we’re only concerned with the aspects in the ChatCompletion returned which are are impacted by the input. Looking through OpenAI’s ChatCompletion description, we can see that the only field which is affected by the input is the choices field.

As such, everything else returned in the ChatCompletion instance is irrelevant from the perspective of evaluations. As before, even when "concise" mode is set, you can view the full ChatCompletion instance with all meta data like so:

print(response.full_repr())

Broader Usage

Aside from being the default response type from the chat/completions endpoint, ChatCompletion instances are also what are logged in the platform (see Logging section) and cached (see Caching section).

This section is mainly to give a quick introduction to the ChatCompletion return type, explain how to print this in a human-readable way when making queries via the various options available, and explain some of the design decisions we’ve made for our own Python client.

To learn more about what each field represents, you should consult OpenAI’s documentation. If anything is unclear, let us know on discord! 👾