.cache.json
file which was previously generated whilst running through this demo.
This avoids the need to make any real LLM calls,
saving time and money,
and it also ensures the walkthrough is fully deterministic.
If you’d rather go down your own unique iteration journey,
then you should skip the cell below,
and either remove cache="read-only"
(turn off caching)
or replace it with cache=True
(create your own local cache) in the agent constructor above.
However,
this would mean many parts of the remaining walkthrough might not directly apply in your case,
as the specific failure modes and the order in which they appear are likely to be different.
pretty_print_dict
for the answer as explained above.
"_"
in the name
(all “non-private” arguments, returns and intermediate variables)
will automatically be logged when the function is called,
due to the inclusion of the unify.log
decorator.
unify.Experiment()
term creates an "experiment"
parameter in the context,
with value "simple_agent"
in this case.
The overwrite=True
argument will remove all prior logs with experiment parameter equal to "simple_agent"
.
This is useful if you would like to re-run an experiment (clearing any previous runs).
If you would like to accumulate data for a specific experiment,
then the overwrite
flag should not be set,
and new logs will simply be added to the experiment without deleting any prior logs.
The unify.Params()
term sets the parameters which are held constant throughout the experiment.
By looking in our interface, we can see that we have some failures,
with a mean error of 0.8
across the ten examples.
Let’s take a look at the traces,
to ensure that the system message template has been implemented correctly,
and each LLM call has the template variables in the system message populated correctly.
It seems as though everything was implemented correctly,
and the per-LLM system messages look good ✅
So,
for the next iteration,
we’ll need to dive in and understand why the agent is failing to make the correct prediction in some cases 🔁