In the last section we created test sets of varying sizes, ready to evaluate our agent. So, it’s finally time to start our data flywheel spinning! 🔁

In pseudo-code, the general process for optimizing an LLM agent is quite straightforward:

1  Create simplest possible agent 🤖
2  While True:
3      Create/expand unit tests (evals) 🗂️
4      While run(tests) failing: 🧪
5          Analyze failures, understand the root cause 🔍
6          Vary system prompt, in-context examples, tools etc. to rectify 🔀
7      [Optional] Beta test with users, find more failures 🚦

Firstly, let’s activate the MarkingAssistant project.

unify.activate("MarkingAssistant")

Let’s also set a new context Evals, where we’ll store all of our evaluation runs.

unify.set_context("Evals")

Great, we can now dive into the first step of the flywheel! 🤿