🔍 Misunderstanding Mark Types
Seven out of ten have a perfect error of 0, with three examples having an error of 1. Let’s take a closer look at these examples, and why they’re failing. Let’s dive in and explore each of these failures in more detail, to see how we can rectify them.Example 207
📄 PDFs
📄 PDFs
❓ Paper 1 -> Question 2
❓ Paper 1 -> Question 2
_Mathematics/J560_01_Paper_1_(Foundation_Tier)_Sample_Question_Paper/paper/imgs/page2.png)
☑️ Paper 1 -> Question 2 Markscheme
☑️ Paper 1 -> Question 2 Markscheme
_Mathematics/J560_01_Paper_1_(Foundation_Tier)_Sample_Question_Paper/markscheme/imgs/page5.png)
❓ Parsed Question [2 Marks]
❓ Parsed Question [2 Marks]
Write these in order, smallest first:0.34, 1/3, 3.5%…………………… …………………… ……………………
📝 Student's Answer
📝 Student's Answer
1/3, 0.34, 3.5%
☑️ Parsed Markscheme
☑️ Parsed Markscheme
3.5%, 1/3, 0.34Part marks and guidance:
- B1 for 1/3 = 0.33… or 33…%
- or B1 for 0.34 = 34%
- or B1 for changing 3.5% to 0.035
- or SC1 for 1/3, 0.34, 3.5%
✅ Correct Marks [1/2] Rationale
✅ Correct Marks [1/2] Rationale
The candidate’s order is 1/3, 0.34, 3.5%. According to the markscheme, this earns SC1 (special case, partially correct ordering) for 1 mark.
🤖 Predicted Marks [0/2] Rationale
🤖 Predicted Marks [0/2] Rationale
The student’s ordering is incorrect because smallest first should be 3.5%, 1/3, 0.34, but the student provided 1/3, 0.34, 3.5%.0
Example 261
📄 PDFs
📄 PDFs
❓ Paper 1 -> Question 19
❓ Paper 1 -> Question 19
_Mathematics/J560_01_Paper_1_(Foundation_Tier)_Sample_Question_Paper/paper/imgs/page17.png)
☑️ Paper 1 -> Question 19 Markscheme
☑️ Paper 1 -> Question 19 Markscheme
_Mathematics/J560_01_Paper_1_(Foundation_Tier)_Sample_Question_Paper/markscheme/imgs/page13.png)
❓ Parsed Question [5 Marks]
❓ Parsed Question [5 Marks]
Peter makes a large amount of pink paint by mixing red and white paint in the ratio 2 : 3.• Red paint costs £80 per 10 litres.
• White paint costs £5 per 10 litres.
• Peter sells his pink paint in 10-litre tins for £60 per tin.Calculate how much profit he makes for each tin he sells.
You must show your working.£ …………………………………………
• White paint costs £5 per 10 litres.
• Peter sells his pink paint in 10-litre tins for £60 per tin.Calculate how much profit he makes for each tin he sells.
You must show your working.£ …………………………………………
📝 Student's Answer
📝 Student's Answer
Peter needs 4 litres of red paint and 6 litres of white paint for every 10-litre batch (because 2 out of 5 parts is red, 3 out of 5 parts is white).
☑️ Parsed Markscheme
☑️ Parsed Markscheme
25 with correct working (5 marks)“Correct working” requires evidence of at least M1M1
- M1 for 10 × (2/5) = 4 litres red or for 10 × (3/5) = 6 litres white
- M1 for red costs £8 per litre or for white costs £0.50 per litre
- M1 for cost of one 10-litre can is their 4 × their 8 + their 6 × their 0.5
- M1 for 60 – their 35
- M1 for 2 : 3 = 20 litres red : 30 litres white
- M1 for 2 × £80 + 3 × £5 = £175
- M1 for (their 175) ÷ 5 = 35
- M1 for 60 – their 35
- M1 for cost of one 10-litre can is their 4 × their 8 + their 6 × their 0.5
✅ Correct Marks [1/5] Rationale
✅ Correct Marks [1/5] Rationale
Awarding 1 mark (M1) for correctly finding that a 10-litre tin contains 4 litres of red and 6 litres of white from the 2:3 ratio. No further steps towards the final profit are shown, so only this 1 mark is earned.
🤖 Predicted Marks [0/5] Rationale
🤖 Predicted Marks [0/5] Rationale
The student’s response only states that Peter needs 4 litres of red paint and 6 litres of white paint per 10-litre batch. It does not show any further working or provide the calculation of the cost of the paints or the profit per tin. Therefore, the answer does not meet the full requirements for the calculation and is awarded 0 marks.0
Example 132 (c)
📄 PDFs
📄 PDFs
❓ Paper 2 -> Question 10 (c)
❓ Paper 2 -> Question 10 (c)
_Mathematics/J560_02_Paper_2_(Foundation_Tier)_Sample_Question_Paper/paper/imgs/page8.png)
_Mathematics/J560_02_Paper_2_(Foundation_Tier)_Sample_Question_Paper/paper/imgs/page9.png)
☑️ Paper 2 -> Question 10 (c) Markscheme
☑️ Paper 2 -> Question 10 (c) Markscheme
_Mathematics/J560_02_Paper_2_(Foundation_Tier)_Sample_Question_Paper/markscheme/imgs/page7.png)
❓ Parsed Question [2 Marks]
❓ Parsed Question [2 Marks]
A newspaper headline reads: ‘High temperatures make more people buy ice cream!’ Does the graph above prove this claim? Give a reason for your decision.
📝 Student's Answer
📝 Student's Answer
No. The graph alone does not completely prove the claim, because we cannot be certain this trend always holds true in all situations.
☑️ Parsed Markscheme
☑️ Parsed Markscheme
Answer: No, because there may be other factors involved (2 marks)
Guidance: B1 for ‘No’, with partial reason.
✅ Correct Marks [1/2] Rationale
✅ Correct Marks [1/2] Rationale
The mark scheme gives 2 marks if the student states “No, there may be other factors involved.” Here, only a partial reason is given, so 1 mark is awarded for denying that the graph proves the claim but not fully explaining other possible factors.
🤖 Predicted Marks [2/2] Rationale
🤖 Predicted Marks [2/2] Rationale
No Rationale Given
Thoughts
Overall, it’s clear that the agent is unable to properly make sense of the different mark types, such asB1
, SC1
, M1
, A1
etc.
This is not surprising,
as we’ve never explained what these terms mean in the system prompt!
🔀 Add Marking Guidelines
Let’s add the general marking guidelines to the system prompt, so the agent knows what all of these mark terms mean, and also fully understands how to interpret the markscheme for each question. The marking guidelines can be extracted from the beginning of any of the markscheme pdf files, such as this one. Let’s store this in a seperate variable, which will make it easier for us to parameterize the inclusion of the guidelines in future experiment iterations.🧪 Rerun Tests
0.3
to 0.2
.
Let’s take a look at the traces,
to ensure that the system message template has been implemented correctly,
and each LLM call has the template variables in the system message populated correctly.
It seems as though everything was implemented correctly,
and the per-LLM system messages look good ✅
Let’s explore the remaining errors in the next iteration 🔁