“For each sub-question , you should populate theWhen they are not present:reasoning
field with your initial reasoning about the correct number of marks to award. Finally, you should put the number of marks to award for this sub-question in themarks
field.”
“You should populate theFirstly, the general template:reasoning
field with your initial reasoning about the correct number of marks to award. Finally, you should put the number of marks to award in themarks
field.”
call_agent
method to set the output format dynamically:
evaluate
method to parse the returned json correctly,
and also include a subquestion level diff,
and update the per-question-breakdown to also include the subquestion level predictions:
0.2
to 0.4
.
Let’s take a look at the traces,
to ensure that the system message template has been implemented correctly,
and each LLM call has the template variables in the system message populated correctly.
It seems as though everything was implemented correctly,
and the per-LLM system messages look good ✅
Let’s explore what’s going wrong in the next iteration 🔁