0.5
.
As usual,
let’s take a look and explore why the agent might be failing on the remaining examples 🕵️
Maybe the purely local reasoning has some shortcomings.
Let’s focus on one of the new regressions,
to understand why our latest change has distrupted the agent where it was previously very consistently correct.
📄 PDFs
❓ Paper 3 -> Question 19 (b)
☑️ Paper 3 -> Question 19 (b) Markscheme
❓ Parsed Question [1 Mark]
📝 Student's Answer
☑️ Parsed Markscheme
✅ Correct Marks [0/1] Rationale
🤖 Predicted Marks [1/1] ❌ Rationale
🤖 Predicted Marks [x/1] Rationales
simple_agent [0/1] ✅
add_markscheme [0/1] ✅
add_marking_guidelines [0/1] ✅
add_structured_output [0/1] ✅
align_context [0/1] ✅
align_guidelines_and_clarify_reasoning [0/1] ✅
mark_type_reasoning [0/1] ✅
{prior_context}
,
which will only be included when sub-questions are present.
Let’s also include the full question.
call_agent
to pass in the required information.
evaluate
accordingly.