Summary of results across three data sets for item response theory (IRT), temporal IRT (TIRT), hierarchical IRT (HIRT), and Deep Knowledge Tracing (DKT) models.
These days, when people talk about artificial intelligence, there’s a lot of excitement around deep learning. AlphaGo, the algorithmic player that defeated 9-dan Go master Lee Sedol, incorporates deep learning, which meant that its programmers didn’t need to teach AlphaGo the rules of Go. They gave AlphaGo a lot of Go matches, and it figured out the rules on its own.
Deep learning has also shown impressive results in areas from computer vision to bioinformatics to linguistics. Deep learning helps Facebook understand the words people post there in more than 20 languages, and Amazon uses it to have conversations through Echo.
So deep learning is proving to be a popular way to understand how people write, speak, see, and play, but how good is it at modeling how people learn?
Last year, a team led by Chris Piech of Stanford University trained a recurrent neural network to do deep learning — or what they call Deep Knowledge Tracing. The idea is that, just as you don’t need to teach AlphaGo how to play the game on its own, Deep Knowledge Tracing can make sense of what’s being learned without human help. Using a public data set from ASSISTments, which guides students through math problem-solving, Deep Knowledge Tracing showed promising initial results.
There are other ways of modeling what students know. Item Response Theory, for example, has been around since the 1950s. It has been extended over the last decade to incorporate how people learn over time as well as expert human knowledge about the hierarchy of concepts being learned.
What’s the best way to predict what students know and don’t know, based on their previous answers and interactions?
Four Knewton data scientists — Kevin Wilson, Yan Karklin, Bojian Han, and Chaitanya Ekanadham — took a closer look at Deep Knowledge Tracing, comparing it with three models of how people learn built upon Item Response Theory. In addition to a classic Item Response Theory (IRT) model, the Knewton data science team used a temporal IRT model (called TIRT in the accompanying charts) and a hierarchical one (shown as HIRT).
The Knewton team used three collections of anonymous student interaction data, including ASSISTments, the Bridge to Algebra 2006–2007 data set from the KDD Cup, and millions of anonymized student interactions collected by Knewton.
With all three data sets, the Knewton team found that the Item Response Theory methods “consistently matched or outperformed” Deep Knowledge Tracing. Not only were the Item Response Theory approaches better at predicting what people know, they were easier to build and tune, “making them suitable candidates for real-world applications” such as adaptive learning platforms.
With Deep Knowledge Tracing, meanwhile, the Knewton team “found that computation time and memory load were prohibitive when training on tens of thousands of items. These issues could not be mitigated by reducing dimensionality without significantly impairing performance.”
In other words, deep learning still has a way to go to match established ways of modeling student learning.
For more details, read Back to the Basics: Bayesian extensions of IRT outperform neural networks for proficiency estimation or visit the International Educational Data Mining Society conference in Raleigh on July 1.
And if you want to reproduce our results, you can find code, links to the data sets, and instructions on GitHub.