Logging enables previously sent requests to the retrieved at a later date for analysis,
reporting, debugging and/or testing. For convenience, these prompts can be filtered
based on many factors, such as arbitrary string tags, the model, the provider,
the endpoint, start time for retrieval window, end time for retrieval window etc.
Queries sent to Unify’s chat completion API can be logged behind the scenes
(#via-unify), and queries made to external LLM providers can also be logged explicitly
(#Other Clients), both of which are explained below.
Via Unify
If the LLM queries are being handled by Unify, then we could make queries as follows,
such that the queries are all sensibly categorized (tagged).
Lets assume we’re building an AI educational assistant, which is serving many students
for many different subjects. In this case, it would make sense to tag queries based on
both the subject and the student:
curl --request POST \
--url 'https://api.unify.ai/v0/chat/completions' \
--header "Authorization: Bearer $UNIFY_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "llama-3-8b-chat@together-ai",
"messages": [
{
"role": "user",
"content": "What is the capital of Spain?"
}
],
"tags": [
"geography",
"john_smith"
]
}'
In Python, this would look like:
import unify
client = unify.Unify("llama-3-8b-chat@together-ai")
client.generate("What is the capital of Spain?", tags=["geography", "john_smith"])
If you want to turn off logging entirely, this can be done via
the usage page in your console.
Presuming that logging is turned on in the console, and presuming you have a
Professional Plan or higher, then logging can be controlled
on a prompt-by-prompt basis, by specifying one or both of the arguments log_query_body
and log_response_body
in the
chat completions endpoint.
As a cURL request, this looks as follows:
curl --request POST \
--url 'https://api.unify.ai/v0/chat/completions' \
--header "Authorization: Bearer $UNIFY_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "llama-3-8b-chat@together-ai",
"messages": [
{
"role": "user",
"content": "What is the capital of Spain?"
}
],
"tags": [
"geography",
"john_smith"
]
"log_prompt_body": true,
"log_response_body": false
}'
In Python, the arguments are passed as keywords:
import unify
client = unify.Unify("llama-3-8b-chat@together-ai")
client.generate(
"What is the capital of Spain?",
tags=["geography", "john_smith"],
log_prompt_body = True,
log_response_body = False
)
If log_query_body
is False
, then log_response_body
will be ignored.
In other words, a response cannot be logged without a corresponding logged query.
Other Clients
If you are not deploying your LLM via Unify, you can still manually log your prompts
to the Unify platform via the CURL request as follows:
curl --request POST \
--url 'https://api.unify.ai/v0/queries' \
--header "Authorization: Bearer $UNIFY_KEY" \
--header 'Content-Type: application/json' \
--data '{
"endpoint": "llama-3.1-8b-chat_ollama@external",
"query_body": {
"messages": [
{
"role": "user",
"content": "What is the capital of Spain?"
}
],
"tags": [
"geography",
"john_smith"
]
}
}'
Similarly, if you also want to include the LLM response in your logged query,
you can use the response_body
key as follows:
curl --request POST \
--url 'https://api.unify.ai/v0/queries' \
--header "Authorization: Bearer $UNIFY_KEY" \
--header 'Content-Type: application/json' \
--data '{
"endpoint": "llama-3.1-8b-chat_ollama@external",
"query_body": {
"messages": [
{
"role": "user",
"content": "What is the capital of Spain?"
}
],
"tags": [
"geography",
"john_smith"
]
},
"response_body": {
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "The capital of Spain is Madrid.",
"role": "assistant"
}
}
],
}'
In Python, this is more convenient via the unify.with_logging
decorator,
as follows for the OpenAI client:
import unify
from openai import OpenAI
client = OpenAI()
client.chat.completions.create = unify.with_logging(client.chat.completions.create, endpoint="gpt-4o_oai@external")
res = client.chat.completions.create(model="gpt-4o", messages=[{"role": "user", "content": "Say hi."}])
The same can be done for any other provider.
import unify
import ollama
ollama.chat = unify.with_logging(ollama.chat, endpoint="llama-3.1-8b-chat_ollama@external")
res = ollama.chat(model="llama3.1", messages=[{"role": "user", "content": "Say hi."}])
As before, the arguments log_query_body
and log_response_body
can be specified,
controlling exactly what is logged to the account.
import unify
import ollama
ollama.chat = unify.with_logging(
ollama.chat,
endpoint="llama-3.1-8b-chat_ollama@external",
log_query_body=True,
log_response_body=False
)
res = ollama.chat(model="llama3.1", messages=[{"role": "user", "content": "Say hi."}])
The function unify.with_logging
by default will assume tags
to be empty.
The log arguments can either be specified when unify.with_logging
is called as a decorator,
or the arguments can be intercepted from the wrapped inner function call.
Arguments passed to the inner wrapped function override arguments passed directly to
unify.with_logging
, except for the tags
argument which will be extended with the
additional tags.
For example, the following will tag the prompts as tagA
and tagB
.
import unify
import ollama
ollama.chat = unify.with_logging(
ollama.chat,
endpoint="llama-3.1-8b-chat_ollama@external",
tags=["tagA", "tagB"]
)
res = ollama.chat(model="llama3.1", messages=[{"role": "user", "content": "Say hi."}])
The following will also tag the prompts as tagA
and tagB
.
import unify
import ollama
ollama.chat = unify.with_logging(
ollama.chat,
endpoint="llama-3.1-8b-chat_ollama@external"
)
res = ollama.chat(model="llama3.1", messages=[{"role": "user", "content": "Say hi."}], tags=["tagA", "tagB"])
However, the following will tag the prompts with tagA
, tagB
and tagC
.
import unify
import ollama
ollama.chat = unify.with_logging(
ollama.chat,
endpoint="llama-3.1-8b-chat_ollama@external",
tags=["tagA"]
)
res = ollama.chat(model="llama3.1", messages=[{"role": "user", "content": "Say hi."}], tags=["tagB", "tagC"])
If you’ve already processed the input and output, and would like to retrospectively log
the query, you can make use of unify.log_query
, which is one of the utility functions
which thinly wraps the REST API:
import unify
import ollama
kw = dict(
model="llama3.1",
messages=[{"role": "user", "content": "Say hi."}],
tags=["tagB", "tagC"]
)
res = ollama.chat(**kw)
unify.log_query(
endpoint="llama-3.1-8b-chat_ollama@external",
query_body=kw,
response_body={"response": res},
tags=["A", "B"]
)
Retrieving Queries
Every query made via the API or manually logged can then be retrieved at a later
stage, using the GET
request with the
/queries
endpoint,
as follows:
curl --request GET \
--url 'https://api.unify.ai/v0/queries?tags=maths,john_smith' \
--header "Authorization: Bearer $UNIFY_KEY"
Again, in Python this would look like:
import unify
prompts = unify.utils.get_queries(tags=["maths", "john_smith"])
for prompt in prompts:
print(prompt)
We could also query only "maths"
and return the maths prompts for all students,
or we could query only "john_smith"
and return the prompts across all subjects for
this student.
If you want to simply retrieve all queries made you can leave the tags
argument
empty, or if you want to retrieve all queries for a student you can omit the subject
tag, and vice versa.
If there is a lot of production traffic, you can also limit the retrieval to a specific
time window, using the argument start_time
(and optionally also end_time
), like so:
import unify
import datetime
start_time = datetime.datetime.now() - datetime.timedelta(weeks=1)
prompts = unify.utils.get_queries(tags=["maths", "john_smith"], start_time=start_time)
for prompt in prompts:
print(prompt)
Extracting historic prompts in this manner can also be useful for creating prompt
datasets from production traffic, as explained in the
Production Data section.