In the last section we created test sets of varying sizes,
ready to evaluate our agent.
So,
itโs finally time to start our data flywheel spinning! ๐In pseudo-code,
the general process for optimizing an LLM agent is quite straightforward:
Copy
Ask AI
1 Create simplest possible agent ๐ค2 While True:3 Create/expand unit tests (evals) ๐๏ธ4 While run(tests) failing: ๐งช5 Analyze failures, understand the root cause ๐6 Vary system prompt, in-context examples, tools etc. to rectify ๐7 [Optional] Beta test with users, find more failures ๐ฆ
Firstly,
letโs activate the MarkingAssistant project.
Copy
Ask AI
unify.activate("MarkingAssistant")
Letโs also set a new context Evals,
where weโll store all of our evaluation runs.
Copy
Ask AI
unify.set_context("Evals")
Great,
we can now dive into the first step of the flywheel! ๐คฟ