Logging enables previously sent requests to the retrieved at a later date for analysis, reporting, debugging and/or testing. For convenience, these prompts can be filtered based on many factors, such as arbitrary string tags, the model, the provider, the endpoint, start time for retrieval window, end time for retrieval window etc.

Queries sent to Unify’s chat completion API can be logged behind the scenes (#via-unify), and queries made to external LLM providers can also be logged explicitly (#Other Clients), both of which are explained below.

Via Unify

If the LLM queries are being handled by Unify, then we could make queries as follows, such that the queries are all sensibly categorized (tagged).

Lets assume we’re building an AI educational assistant, which is serving many students for many different subjects. In this case, it would make sense to tag queries based on both the subject and the student:

curl --request POST \
  --url 'https://api.unify.ai/v0/chat/completions' \
  --header "Authorization: Bearer $UNIFY_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "llama-3-8b-chat@together-ai",
    "messages": [
            "role": "user",
            "content": "What is the capital of Spain?"
    "tags": [

In Python, this would look like:

import unify
client = unify.Unify("llama-3-8b-chat@together-ai")
client.generate("What is the capital of Spain?", tags=["geography", "john_smith"])

If you want to turn off logging entirely, this can be done via the usage page in your console.

Presuming that logging is turned on in the console, and presuming you have a Professional Plan or higher, then logging can be controlled on a prompt-by-prompt basis, by specifying one or both of the arguments log_query_body and log_response_body in the chat completions endpoint.

As a cURL request, this looks as follows:

curl --request POST \
  --url 'https://api.unify.ai/v0/chat/completions' \
  --header "Authorization: Bearer $UNIFY_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "llama-3-8b-chat@together-ai",
    "messages": [
            "role": "user",
            "content": "What is the capital of Spain?"
    "tags": [
    "log_prompt_body": true,
    "log_response_body": false

In Python, the arguments are passed as keywords:

import unify
client = unify.Unify("llama-3-8b-chat@together-ai")
    "What is the capital of Spain?",
    tags=["geography", "john_smith"],
    log_prompt_body = True,
    log_response_body = False

If log_query_body is False, then log_response_body will be ignored. In other words, a response cannot be logged without a corresponding logged query.

Other Clients

If you are not deploying your LLM via Unify, you can still manually log your prompts to the Unify platform via the CURL request as follows:

curl --request POST \
  --url 'https://api.unify.ai/v0/queries' \
  --header "Authorization: Bearer $UNIFY_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "endpoint": "llama-3.1-8b-chat_ollama@external",
    "query_body": {
        "messages": [
                "role": "user",
                "content": "What is the capital of Spain?"
        "tags": [

Similarly, if you also want to include the LLM response in your logged query, you can use the response_body key as follows:

curl --request POST \
  --url 'https://api.unify.ai/v0/queries' \
  --header "Authorization: Bearer $UNIFY_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "endpoint": "llama-3.1-8b-chat_ollama@external",
    "query_body": {
        "messages": [
                "role": "user",
                "content": "What is the capital of Spain?"
        "tags": [
    "response_body": {
        "choices": [
                "finish_reason": "stop",
                "index": 0,
                "message": {
                    "content": "The capital of Spain is Madrid.",
                    "role": "assistant"

In Python, this is more convenient via the unify.with_logging decorator, as follows for the OpenAI client:

import unify
from openai import OpenAI
client = OpenAI()
client.chat.completions.create = unify.with_logging(client.chat.completions.create, endpoint="gpt-4o_oai@external")
res = client.chat.completions.create(model="gpt-4o", messages=[{"role": "user", "content": "Say hi."}])

The same can be done for any other provider.

import unify
import ollama
ollama.chat = unify.with_logging(ollama.chat, endpoint="llama-3.1-8b-chat_ollama@external")
res = ollama.chat(model="llama3.1", messages=[{"role": "user", "content": "Say hi."}])

As before, the arguments log_query_body and log_response_body can be specified, controlling exactly what is logged to the account.

import unify
import ollama
ollama.chat = unify.with_logging(
res = ollama.chat(model="llama3.1", messages=[{"role": "user", "content": "Say hi."}])

The function unify.with_logging by default will assume tags to be empty. The log arguments can either be specified when unify.with_logging is called as a decorator, or the arguments can be intercepted from the wrapped inner function call.

Arguments passed to the inner wrapped function override arguments passed directly to unify.with_logging, except for the tags argument which will be extended with the additional tags.

For example, the following will tag the prompts as tagA and tagB.

import unify
import ollama
ollama.chat = unify.with_logging(
    tags=["tagA", "tagB"]
res = ollama.chat(model="llama3.1", messages=[{"role": "user", "content": "Say hi."}])

The following will also tag the prompts as tagA and tagB.

import unify
import ollama
ollama.chat = unify.with_logging(
res = ollama.chat(model="llama3.1", messages=[{"role": "user", "content": "Say hi."}], tags=["tagA", "tagB"])

However, the following will tag the prompts with tagA, tagB and tagC.

import unify
import ollama
ollama.chat = unify.with_logging(
res = ollama.chat(model="llama3.1", messages=[{"role": "user", "content": "Say hi."}], tags=["tagB", "tagC"])

If you’ve already processed the input and output, and would like to retrospectively log the query, you can make use of unify.log_query, which is one of the utility functions which thinly wraps the REST API:

import unify
import ollama
kw = dict(
    messages=[{"role": "user", "content": "Say hi."}],
    tags=["tagB", "tagC"]
res = ollama.chat(**kw)
    response_body={"response": res},
    tags=["A", "B"]

Retrieving Queries

Every query made via the API or manually logged can then be retrieved at a later stage, using the GET request with the /queries endpoint, as follows:

curl --request GET \
  --url 'https://api.unify.ai/v0/queries?tags=maths,john_smith' \
  --header "Authorization: Bearer $UNIFY_KEY"

Again, in Python this would look like:

import unify
prompts = unify.utils.get_queries(tags=["maths", "john_smith"])
for prompt in prompts:

We could also query only "maths" and return the maths prompts for all students, or we could query only "john_smith" and return the prompts across all subjects for this student.

If you want to simply retrieve all queries made you can leave the tags argument empty, or if you want to retrieve all queries for a student you can omit the subject tag, and vice versa.

If there is a lot of production traffic, you can also limit the retrieval to a specific time window, using the argument start_time (and optionally also end_time), like so:

import unify
import datetime
start_time = datetime.datetime.now() - datetime.timedelta(weeks=1)
prompts = unify.utils.get_queries(tags=["maths", "john_smith"], start_time=start_time)
for prompt in prompts:

Extracting historic prompts in this manner can also be useful for creating prompt datasets from production traffic, as explained in the Production Data section.