In the last section we created test sets of varying sizes, ready to evaluate our agent. So, itโ€™s finally time to start our data flywheel spinning! ๐Ÿ” In pseudo-code, the general process for optimizing an LLM agent is quite straightforward:
1  Create simplest possible agent ๐Ÿค–
2  While True:
3      Create/expand unit tests (evals) ๐Ÿ—‚๏ธ
4      While run(tests) failing: ๐Ÿงช
5          Analyze failures, understand the root cause ๐Ÿ”
6          Vary system prompt, in-context examples, tools etc. to rectify ๐Ÿ”€
7      [Optional] Beta test with users, find more failures ๐Ÿšฆ
Firstly, letโ€™s activate the MarkingAssistant project.
unify.activate("MarkingAssistant")
Letโ€™s also set a new context Evals, where weโ€™ll store all of our evaluation runs.
unify.set_context("Evals")
Great, we can now dive into the first step of the flywheel! ๐Ÿคฟ