🔍 Lack of Global Context
These updates have actually regressed the overall performance, with us now having a mean error of0.5
.
As usual,
let’s take a look and explore why the agent might be failing on the remaining examples 🕵️
Maybe the purely local reasoning has some shortcomings.
Let’s focus on one of the new regressions,
to understand why our latest change has distrupted the agent where it was previously very consistently correct.
Example 20 (b)
📄 PDFs
📄 PDFs
❓ Paper 3 -> Question 19 (b)
❓ Paper 3 -> Question 19 (b)
_Mathematics/J560_03_Paper_3_(Foundation_Tier)_Sample_Question_Paper/paper/imgs/page19.png)
☑️ Paper 3 -> Question 19 (b) Markscheme
☑️ Paper 3 -> Question 19 (b) Markscheme
_Mathematics/J560_03_Paper_3_(Foundation_Tier)_Sample_Question_Paper/markscheme/imgs/page14.png)
❓ Parsed Question [1 Mark]
❓ Parsed Question [1 Mark]
Explain why he might not want to use Account A.
📝 Student's Answer
📝 Student's Answer
He might not want to use Account A because the bank might keep the money forever and he could lose it.
☑️ Parsed Markscheme
☑️ Parsed Markscheme
He may not want to leave it there for 3 years. Accept any valid reason.
✅ Correct Marks [0/1] Rationale
✅ Correct Marks [0/1] Rationale
The markscheme awards 1 mark for a valid reason such as not wanting to lock away funds for 3 years. Here, the stated reason is irrelevant and does not accurately reflect the restrictions of Account A.
🤖 Predicted Marks [1/1] ❌ Rationale
🤖 Predicted Marks [1/1] ❌ Rationale
The mark scheme states that any valid reason for not using Account A is acceptable, specifically mentioning not leaving the money in the account for 3 years. The candidate’s answer gives a different, yet valid, reason: that he might lose the money because the bank could keep it forever. Although not identical to the mark scheme’s example, the answer is acceptable as a valid reason, therefore the single mark should be awarded.
🤖 Predicted Marks [x/1] Rationales
🤖 Predicted Marks [x/1] Rationales
simple_agent [0/1] ✅
simple_agent [0/1] ✅
The answer should explain that although Account A gives slightly more money, Derrick might not want this account because his money is tied up for three years (i.e. it has poor liquidity). The student’s explanation about the bank “keeping the money forever” is not an acceptable financial rationale.
add_markscheme [0/1] ✅
add_markscheme [0/1] ✅
The explanation provided is not a valid reason and does not address the restriction of no withdrawals for three years. Therefore, no mark can be awarded for this part either.
add_marking_guidelines [0/1] ✅
add_marking_guidelines [0/1] ✅
The explanation given (“because the bank might keep the money forever and he could lose it”) does not correctly address why he might not want to use Account A (namely, that his money is locked away for 3 years with no withdrawals allowed).
add_structured_output [0/1] ✅
add_structured_output [0/1] ✅
The answer needed to explain why Account A might not be desirable, which is because funds are locked in for three years and withdrawals are not allowed until the end of the period. The student’s answer states that the bank might keep the money forever and he could lose it, which does not address the actual restriction (lack of liquidity). This is an irrelevant or incorrect reason and does not meet the mark scheme.
align_context [0/1] ✅
align_context [0/1] ✅
The mark scheme accepts any valid reason such as not being able to withdraw the money because it is locked in for 3 years. The candidate’s answer – that the bank might ‘keep the money forever’ and he could lose it – is not a valid explanation for the funds being inaccessible. It does not clearly express that his money is tied up for 3 years. Therefore, this explanation does not meet the requirement and no mark should be awarded.
align_guidelines_and_clarify_reasoning [0/1] ✅
align_guidelines_and_clarify_reasoning [0/1] ✅
The mark scheme accepts any valid reason such as the inconvenience of having to leave the money invested for 3 years. The candidate’s explanation – that the bank might keep the money forever and he could lose it – is not a valid or relevant reason within the context of the question. Consequently, the answer does not meet the requirement for awarding the 1 mark.
mark_type_reasoning [0/1] ✅
mark_type_reasoning [0/1] ✅
The explanation required a valid reason such as the lack of liquidity (money locked for three years). The candidate’s answer instead gives an unrelated and implausible reason, making it unacceptable for the mark scheme. Therefore, no mark is awarded.
🔀 Include Preceeding Context
Let’s first update the system prompt, re-introducing the placeholder for the aligned subquestions, markschemes and answers, this time calling it{prior_context}
,
which will only be included when sub-questions are present.
Let’s also include the full question.
call_agent
to pass in the required information.
evaluate
accordingly.