Motivation

According to QA Education, teachers spend a whole day on marking each week, and the majority say they do not spend enough time teaching as a result. Let’s try and fix this using AI!

We’ll use LLMs to build an app that asks students questions and marks their answers, given a known markscheme. We have to start somewhere, so let’s start with some practice exam papers for high school level maths.

Data

To kick things off, we’ve parsed three practice exam papers from OCR’s Sample Assessment Material for GCSE Maths, and used OpenAI’s o1 model to generate synthetic student answers which o1 believes are appropriate for each possible number of marks to be awarded.

The details for these preliminary steps are not important, but you’re welcome to dig deeper if interested. Feel free to check out:

  • the script and data for the parsed papers and markschemes

  • the script and data for the synthetic student answers and target marks to award

  • the script and data for building the resultant test set, for our agent evaluations

Steps

For the demo, we’ll make use of this synthetic data, and walk through the following three steps:

  1. Build a Usage Dashboard 📊
  2. Upload Datasets 🗂️
  3. Iterate and Improve our Agent 🔁

Step 3 is the most involved, where we’ll perform several iterations to really get the AI agent flying 🪁

So, without further ado… let’s dive in! 🤿