reasoning
field for each sub-question,
with a field for each mark type referenced in the sub-question markscheme,
going from the following structure:
SC1
for Example 207,
M1
for Example 261,
and B1
for Example 132 (c).
PerMarkReasoning
pydantic type,
with one ThoughtsAndAwardDecision
instance for each mark detected in the sub-question markscheme.
MarksAndReasoning
(previously this was statically defined, see previous iteration)
such that the reasoning
field is no longer just a string,
but is intead our newly created PerMarkReasoning
(above).
create_response_format
such that we’re making use of our newly defined create_marks_and_reasoning_format
for each sub-question.
update_markscheme
defined in the previous iteration,
which parses the markscheme in the same manner but for a different reason.
Let’s have the function extract the marks,
and also the surrounding context.
call_agent
such that we call parse_available_marks_from_markscheme
on each sub-question markscheme,
and then pass these into our newly defined create_response_format
.
o3-mini
is certainly very stubborn about it’s decision for these questions.
Let’s take a look at the traces,
to ensure that the system message template has been implemented correctly,
and each LLM call has the template variables in the system message populated correctly.
It seems as though everything was implemented correctly,
and the per-LLM system messages look good ✅
Again,
let’s explore what’s going wrong in the next iteration 🔁