4: Better Align Context
The full script for running this iteration can be found here.
🔍 Context Alignment
Let’s take a look, and see if we can work out why the agent might be failing on some examples 🕵️
In terms of failures, let’s take Example 132 as an example. For this particular question, there are 6 sub-questions (a.i, a.ii, b.i, b.ii, b.iii, c), and we’re asking the LLM to do a lot in a single shot:
- understand all 16 points in the general marking guidelines
- understand all 6 of the sub-questions
- understand all 6 of the student’s answers to the sub-questions
- understand all 6 of the markscheme’s reasoning for said sub-questions
More importantly, the system prompt doesn’t align the relevant information together. The agent receives the information like so:
Let’s update the system prompt, so the information is aligned better, more like the following:
🔀 Better Align Context
So, let’s go ahead and improve the system prompt such that relevant information is closer together, and see if it helps 🤞
First,
let’s abstract this into a "{questions_markscheme_and_answers}"
placeholder:
Let’s then update call_agent
Let’s also update our evaluate
function,
so that we pass the sub_questions
into the call_agent
function:
🧪 Rerun Tests
The mean error has now dropped to 0.3
.
Let’s take a look at the traces, to ensure that the system message template has been implemented correctly, and each LLM call has the template variables in the system message populated correctly.
It seems as though everything was implemented correctly, and the per-LLM system messages look good ✅
Again, let’s explore what’s going wrong in the next iteration 🔁