Usage Dashboard
Let’s assume our app has been deployed for a few weeks now, and we’ve been tracking the daily usage 📈
For now, let’s assume that the users answer the questions, and then humans have been marking the questions asynchronously, all of which is captured in the usage data.
This manual marking process is both timely and expensive, and we’d like to automate this part of our app using AI 🤖
Historic Data
To begin with, let’s get all of our historic usage data added into our Unify interface. Let’s assume we have this data exported to usage_data.json. This has already been captured, and so we now just need to migrate it over to our new Unify interface.
Upload the Data
Firt things first, let’s add the necessary imports,
and then activate our project.
Let’s also set the context as "Usage"
,
as we’ll be uploading and analyzing usage data in this section.
Let’s now go ahead and download the usage_data.json
file.
Let’s then load this data into Python.
Finally, let’s upload this production traffic into our new interface, to get a complete picture. This will take a few minutes, as there are 10,000 logs to upload.
The full script for this upload can be found here.
Analyze the Data
Rename Tab and Set Context
We can already start to explore the data while the upload is happening.
Let’s open our new “MarkingAssistant” project,
and rename the default tab1
to Usage
.
Let’s then also set the context of the table to “Usage”, so that the tab is accessing the correct data.
Find Most Active Users
We can then start to explore the data. Let’s first explore which students are making the most use of the platform, answering the highest volume of questions.
Via the Table
We can remove the viewer tile for now (we don’t need it yet),
and also expand the table tile.
Let’s then click group by
above the email
field.
This will group each log on a student-by-student basis,
each of which can be unfolded to see the raw data.
The reduction metrics of each numeric, boolean and datetime group are shown in green in the table, and values shared across the entire group are shown in light blue (these are not shown in the GIFs, which were created before this feature addition).
By default the mean
is plotted for the reduction metrics,
but this can be changed at the bottom of the table.
Let’s change the metric to count
instead,
and let’s order the groups in ascending count order.
It doesn’t matter which column we use for the group sorting,
they all have the same count
value,
which refers to the number of logs (rows) in the group.
In this case we’ll just go with the timestamp count.
With this configuration applied,
we have the students making most use shown at the top,
and those making the least use shown at the bottom.
Via Plotting
We can also plot the distribution of usage across each student as a bar chart.
Let’s first make some room for the plot on the right hand side by resizing the table.
We then add the plot tile, select the bar chart type in the plot,
with student/email
for the x
axis (unique student identifier)
and student/email -> count
for the y
axis.
A separate bar is then plotted for each unique email address,
with the number of logs on the y axis.
Average Usage Time
Let’s do a slightly more complicated example,
and find the average time in the day when males vs females are on the platform.
Let’s remove the grouping and filtering by clicking on the green icons.
Let’s then create a new derived column for the time of day,
by applying the time()
function on our datetime column.
If we keep mean
as our reduction statistic, and group by gender
,
we’ll then get the average time of day when males vs female are on the platform.
These are just examples. We can manipulate the data in all kinds of other ways. If you have any other ideas, then give them a try! 📊
Live Traffic
Finally, let’s imagine we have live production traffic streaming in from our app, and we’d like to have a dashboard to monitor this live data 📈
Stream the Data
Let’s first initialize the asynchronous logger, for much faster (non-blocking) logging (but without catching any of the created log ids).
We can simulate the live production traffic by sampling from the existing usage dataset, and updating the timstamp to be the current time, with some noise applied.
The full script for streaming the data can be found here.
Monitor the Data
We can then create a dashboard to monitor this live data as it is streaming in. Let’s dive in!
Filter for Today
Let’s first filter the log timestamps so we only get those which have be logged today or later, so we don’t get any of the historic data included in the table.
Stream Table
Let’s then also turn on streaming mode for the table, and then we’ll see the new logs flash green as they appear in the table. The most recent logs are shown at the top of the table by default.
Create Usage Histogram
Let’s now create a plot to monitor how the traffic is changing over time.
We can create a new plot tile,
select the histogram type,
select timestamp for the x
axis,
and then select the number of buckets we’d like to use.
Let’s go with 50 buckets.
Stream Plot
Let’s now turn on streaming mode for both the table and the plot, such that the plot is also automatically updated whenever new data is available.
Dynamic Time Window
The plot and table will continue to grow unbounded in the current configuration. Let’s update the filter on the timestamp column, such that we’re always only retrieving logs from the past 2 minutes, relative to the current time. Now, the number of plotted logs remains steady at around ~500, with older logs falling out of scope once they’re more than 2 minutes old.
Freeze Data
Let’s assume we want to spot trends in the recent usage data. Let’s first revert the timestamp filter back to “today and later”, so we’re still not including the large amount of historic data, but we’re no longer limiting it to only the past 2 minutes. Let’s also freeze the data, so no new data is added into the platform beyond the point in time when the freeze was clicked (despite the streaming still occuring behind the scenes). Freezing the data can help us to focus on trends with a fixed data sample, without a constant stream of new data distorting things.
Filter for Student
So, let’s play around with the data a bit. For example, if we want to look at traffic for a particular student, we can simply filter for this student in the table, and the plot will also be updated automatically.
Filter for Paper and Question
Alternatively, we could filter for a particular paper and question number, and show only the traffic which is not receiving full marks for that question.
Multiple Plots
If we’d like to view several of these plots at the same time, then we can simply make a clone of the table (or create a new one), make the filtering changes to this new table, and then create another plot which references this second table.
Let’s add two plots: a histogram for monitoring the traffic and a bar chart to show the question difficulties, both plotted over the past 24 hours.
Histogram
Let’s first add the histogram for monitoring the traffic.
Bar Chart
Let’s now add the bar chart for monitoring the question difficulties.
However, we’ll first need to create some derived columns:
performance
for the ratio of marks acheived out of the total available,
and paper_question_num
for a unique paper + question value.
Let’s create these derived columns.
We can now use these derived columns to plot the desired bar chart, remembering to also apply the 24 hours filter on the table.
Hide Tables
If there are any tables on screen that you don’t need to see, then you can simply hide them. Let’s hide our two tables, whose sole purpose is to define the data for the two plots. We can also turn off edit mode and turn off interactive mode, if we would like our interface to instead act as a static dashboard.
Save Interface
If you’re happy with your configuration,
you can explicitly save the tab.
We can then close the browser,
and when we open up the MarkingAssistant
project
and select our Usage
tab,
we see the same configuration we previously saved.
Try your Own
This set of examples have just been a very simple taster of the kinds of things you can quickly hack together in an interface for logging, in this case focusing on the live usage data.
You should play around with this interface yourself, and explore other useful representations you can create by juggling things around, and applying filters, sorting, grouping, and various plots. If you find a useful representation of the data, then you can save this tab and come back to this view of the data at a later stage 📊