In the last section we created test sets of varying sizes,
ready to evaluate our agent.
So,
it’s finally time to start our data flywheel spinning! 🔁In pseudo-code,
the general process for optimizing an LLM agent is quite straightforward:
Copy
Ask AI
1 Create simplest possible agent 🤖2 While True:3 Create/expand unit tests (evals) 🗂️4 While run(tests) failing: 🧪5 Analyze failures, understand the root cause 🔍6 Vary system prompt, in-context examples, tools etc. to rectify 🔀7 [Optional] Beta test with users, find more failures 🚦
Firstly,
let’s activate the MarkingAssistant project.
Copy
Ask AI
unify.activate("MarkingAssistant")
Let’s also set a new context Evals,
where we’ll store all of our evaluation runs.
Copy
Ask AI
unify.set_context("Evals")
Great,
we can now dive into the first step of the flywheel! 🤿