How does Knewton’s Proficiency Model estimate student knowledge in alta?
Accurately estimating a student’s knowledge is one of the core challenges of adaptive learning.
By understanding what a student knows and doesn’t know, adaptive learning technology is able to deliver a learning experience that will help the student achieve mastery. Understanding the student knowledge state is also essential for delivering accurate, useful analytics to students and instructors.
We refer to our data-driven mathematical model for estimating a student’s knowledge state as Knewton’s Proficiency Model. This model lies at the core of our ability to deliver lasting learning experiences to students using alta.
How does our Proficiency Model estimate student knowledge? Answering that question begins by looking at its inputs, which include:
- The observed history of a student’s interactions, including which questions the student answered correctly and incorrectly, the instructional material they studied, and when they performed these activities.
- Content properties, such as the difficulty of a question the student is answering.
- The structure of the Knewton Knowledge Graph, in particular the prerequisite relationships between learning objectives.
The model’s outputs represent the student’s proficiencies in all of the learning objectives in the Knowledge Graph at a given point in time. So what’s in between the inputs and the outputs?
A basis in Item Response Theory
The foundation for our Proficiency Model is a well-known educational testing theory known as Item Response Theory (IRT).
One important aspect of IRT is that it benefits from network effects — that is, we learn more about the content and the students interacting with it as more people use the system. When a student answers a difficult question correctly, the model’s estimated proficiency for that student should be higher than it would be if the student had correctly answered an easy question. But how can we determine each question’s difficulty level? Only by observing how large numbers of diverse students performed when responding to those questions.
With this data in-hand, we are able to better and more efficiently infer student proficiency — or weakness — and deliver content that is targeted and effective.
Moving beyond the limits of IRT
Because IRT was designed for adaptive testing — a learning environment in which a student’s knowledge remains fixed — it does not meet all of the requirements of adaptive learning, an environment in which the student’s knowledge is continually changing. In a model based on IRT, a student’s older responses make the same impact on the student’s proficiency level as their more recent responses. While this is fine in a testing environment, in which students aren’t typically provided feedback or instruction, it becomes a problem in an adaptive learning environment.
In an adaptive learning environment, we inherently expect that students’ knowledge will change. As a result, we want to give more weight to recent responses than older ones — allowing for the possibility of an “Aha!” moment along the way.
To correct for the limitations of IRT, Knewton has built temporal models that weight a student’s recent responses more heavily than their older ones when determining proficiency, providing a more accurate and dynamic picture of the student’s knowledge state.
Accounting for relationships between learning objectives
Adaptive learning requires constant, granular assessment on multiple learning objectives embedded in the learning experience. However, traditional IRT also does not account for the relationships between learning objectives. As discussed above, these relationships are an important part of the Knewton Knowledge Graph.
To remedy this shortcoming of IRT, Knewton has developed a novel way to incorporate these relationships in a Bayesian modeling framework, allowing us to incorporate prior beliefs about proficiency on related topics, with evidence provided by the student’s responses. This leads to so-called proficiency propagation, or the flow of proficiency throughout the Knowledge Graph.
What does this look like in practice? If, in the Knowledge Graph below, a student is making progress toward the learning objective of “Solve word problems by subtracting two-digit numbers,” our Proficiency Model infers a high proficiency on that learning objective. The model also infers a high proficiency on the related learning objectives (“Subtract two-digit numbers” and “Subtract one-digit numbers”), even without direct evidence. The basic idea: If two learning objectives are related and a student masters one of them, there’s a good chance the student has also mastered the others.
The effectiveness of Knewton’s Proficiency model
The many facets of the Proficiency Model – IRT-based network effects, temporal effects, and the Knowledge Graph structure – combine to produce a highly accurate picture of a student’s knowledge state. We use this picture to provide content that will increase that student’s level of proficiency. It’s also the basis of the actionable analytics we provide to students and instructors.
How effective is the Proficiency Model in helping students master learning objectives? In his post “Interpreting Knewton’s 2017 Student Mastery Results,” fellow Knerd Andrew D. Jones presents data that shows that Knewton’s Proficiency Model helps students achieve mastery — and that mastery, as determined by the Proficiency Model, makes a positive impact on student’s academic performance.
What does knowing something tell us about a related concept?
At Knewton, we’ve built an adaptive learning platform that powers digital education around the world based on cutting-edge algorithms that leverage the diverse datasets we receive. One of the core data-driven models that powers everything we do is our Proficiency Model, which we use to infer each student’s knowledge state. We do this by combining a “knowledge graph”, time-tested psychometric models, and additional pedagogically motivated modeling. We’ll show you how the relationships in the knowledge graph get realized in Knewton’s Proficiency Model and answer the question: “What does knowing something tell us about knowing a related concept?” This has important pedagogical consequences, as well as an enormous impact on how our recommendations get served (and how confident we can be in their accuracy!).
The Knowledge Graph
One of the core components of Knewton adaptivity is the knowledge graph. In general, a graph is composed of nodes and edges. In our case, the nodes represent independent concepts, and the edges represent prerequisiterelationships between concepts. An edge between concepts A and B (A → B) can be read as Concept A is prerequisite to concept B. This means that the student generally must know concept A before being able to understand concept B. Consider the example portion of a knowledge graph below:
In math-speak this is a directed acyclic graph (DAG). We already covered what the “graph” part means. The “directed” part just means that the edges are directed, so that “A prerequisite to B” does not mean “B prerequisite to A” (we instead say “B postrequisite to A”). This is in contrast to undirected edges in social networks where, for example, “A is friends with B” does imply “B is friends with A”. The “acyclic” part of DAG means there are no cycles. A simple cycle would involve A → B → C → A. This would imply that you need to know A to know B, B to know C, and then C to know A! This is a horrible catch-22. You can never break the cycle and learn these concepts! Disallowing cycles in the graph allows us to represent a course, without contradictions, as starting with more basic concepts, and leading to more advanced concepts as the student progresses (this progression is top-to-bottom in the graph above).
Another crucial aspect of the knowledge graph is the content: i.e. the assessing questions and the instructional material. Each concept has a number of such content pieces attached, though we don’t show them in the picture above. You can think of them as living inside the node.
How do we know what you know?
Of course, we can never know exactly what you know– that’d be creepy! Instead we estimate the student knowledge state using a mathematical model called the Proficiency Model. This takes, as inputs, the observed history of a student’s interactions, the graph structure, and properties of the content (question difficulty, etc.) and outputs the student’s proficiency in all the concepts in the graph at a given point in time. This is summarized below:
Abstractly, proficiency on a concept refers to the ability for a student to perform tasks (such as answer questions correctly) related to that concept. Thus, we can use the estimated values of the proficiencies to predict whether the student answers future questions correctly or not. Comparing our predictions to reality provides valuable feedback that allows us to constantly update and improve our model and assumptions.
The foundation for our Proficiency Model is a well-tested educational testing theory known as Item Response Theory (IRT). One important aspect of IRT is that it accounts for network effects— we learn more about the content and the students as more people use the system, leading to better and better student outcomes. IRT also serves as a foundation for our Proficiency Model on which we can build additional features.
One thing that basic IRT does not include is any notion of temporality. Thus older responses count the same as newer responses. This is fine in a testing environment, where “older” responses mean “generated 20 minutes ago”, but isn’t great in a learning environment. In a learning environment, we (obviously) expect that students will be learning, so we don’t want to overly penalize them for older work when in fact they may have had an “Aha!” moment. To remedy this, we’ve built temporal models into IRT that make more recent responses count more towards your proficiency estimate than older responses on a concept*.
Another thing that basic IRT does not account for is instructional effects. Consider the following example. Alice got 2 questions wrong, watched an informative video on the subject, and then got one question right. Under basic IRT we’d infer that her proficiency was the same as Bob who got the same 2 question wrong, did not watch the video, and then got one question correct. This doesn’t seem accurate. We should take Alice’s instructional interaction into account when inferring her knowledge state and deciding what’s best for her to work on next. We have extended IRT to take into account instructional effects.
Finally, basic IRT does not account for multiple concepts, nor their interrelationships in the knowledge graph. This will be the main focus of the rest of this post.
The titular question of this post: “What does knowing something tell us about knowing a related concept?” is answered through Proficiency Propagation. This refers to how proficiency flows (propagates) to different concepts in the knowledge graph.
To motivate why proficiency propagation is important, let’s consider two different scenarios.
First, consider the example shown below, where the only activity we’ve observed from Alice is that she performed well (a ✔ indicates a correct response) on several more advanced concepts.
We can’t know everything Alice has ever done in this course– she may have done a lot of work offline and answered tons of “Add whole numbers” questions correctly. Since we don’t have access to this information, we have to make our best inference. Note that all three concepts Alice excelled at are reliant upon “Add whole numbers” as a prerequisite. Let’s revisit the definition of the prerequisite relationship. We say “A is prerequisite to B” (A → B) if A must be mastered in order to understand B. In other words:
Concept B is mastered ⇒ Concept A is mastered
In our case, there are three different “concept B’s” that Alice has clearly mastered. Thus, by definition of the prerequisite relationship Alice almost certainly has mastered “Add whole numbers” (it’s the concept A). So let’s paint that green, indicating likely mastery.
By similar reasoning, if Alice has mastered “Add whole numbers”, then she has likely mastered its prerequisite “Understand the definition of whole numbers and their ordering”. However, we might be slightly less certain about this inference, since it is more indirect and relies on a chain of reasoning. So let’s paint that slightly less bright green:
What about the remaining two concepts? First consider “Multiply whole numbers”. Alice has mastered its prerequisite, which is promising. But she may have never received any instruction on multiplication, and may have never even heard of such a thing! On the other hand, she may be a prolific multiplier, having done lots of work on it in an offline setting. In this case, we don’t have the definition of “prerequisite” working in our favor giving us a clean inference. But certainly if we had to guess we’d say Alice is more likely to have mastered “Multiply whole numbers” than someone else who we have no info on. Thus, we give Alice a small benefit of the doubt proficiency increase from the baseline. Similar considerations apply to the last, most advanced concept:
Let’s summarize the lessons we’ve learned:
- Mastery (i.e. correct responses) propagates strongly ‘backwards’ to prerequisites.
- As we get further from direct evidence in the prerequisite chain, there is more uncertainty. Thus we infer slightly less mastery.
- Mastery propagates weakly ‘forwards’ to postrequisites.
Now let’s consider Bob, who has struggled on “Add whole numbers”, getting 3 incorrect:
Recall our deconstruction of the prerequisite relationship A → B:
Concept B is mastered ⇒ Concept A is mastered
Unfortunately, this doesn’t directly help us here, because Bob hasn’t mastered any concepts as far as we know. However, the contrapositive is exactly what we need:
Concept A is not mastered ⇒ Concept B is not mastered
Let’s take “struggling on” to be equivalent to “not mastered” for our purposes to get:
Struggling on Concept A ⇒ Struggling on Concept B
Thus, we now know that struggling-ness propagates strongly down to the postrequisites of “Add whole numbers”!
What about “Understand the definition of whole numbers and their ordering”? Similarly to the flipped situation of propagating mastery to postrequisites, we cannot make any strong pedagogical inferences just from the prerequisite relationship. However, we can still assert that it is more likely that Bob is struggling on it given we’ve seen him struggle on “Add whole numbers” than if we hadn’t seen him struggle on that concept:
Let’s summarize what we’ve learned about propagation of struggling-ness:
- Struggling (i.e. incorrect responses) propagates strongly forwards to postrequisites.
- As we get further from direct evidence in the postrequisite chain, there is more uncertainty. Thus we infer slightly less struggling.**
- Struggling propagates weakly backwards to prerequisites.
Notice these rules are just the mirror-opposites of the ones for propagating mastery! And all of this comes simply from the definition of “prerequisite-ness”, and some pedagogical reasoning.
While we now have a nice picture of how we want proficiency propagation to behave, that doesn’t count much unless we can rigorously define a mathematical model capturing this behavior, and code up an algorithm to efficiently compute proficiencies in real time for all possible cases. As they say, the devil is in the details. To give a flavor of what’s involved, here are some of the technical details our mathematical model and algorithm must obey:
- Convexity: This essentially means that the proficiencies are efficiently and reliably computable.
- Strong propagation of mastery up to prerequisites, and of struggling-ness down to postrequisites, with a slight decay in propagation strength at each ‘hop’ in the graph.
- Weak propagation of mastery down to postrequisites, and of struggling-ness up to prerequisites, with a large decay in propagation strength at each ‘hop’ in the graph.
- The above two points imply asymmetric propagation: The impact of a response on neighboring proficiencies is asymmetric, always being stronger in one direction in the graph than the other.
- All of this proficiency propagation stuff must also play nicely with the aforementioned IRT model and the extensions to include temporality and instructional effects.
Coming up with a well-defined mathematical model encoding asymmetric strong propagation is a challenging and fun problem. Come work at Knewton if you want to learn more details! !
Putting it all together
So what good exactly does having this fancy proficiency model do us? At the end of the day, students care about being served a good educational experience (and ultimately, progressing forward through their schooling), and in Knewton-land that inevitably means getting served good recommendations. Certainly, having a pedagogically-sound and accurate proficiency model does not automatically lead to good recommendations. But having a bad proficiency model almost certainly will lead to bad recommendations. A good proficiency model is necessary, but not sufficient for good recommendations.
Our recommendations rely on models built “on-top” of the Proficiency Model, and answer questions such as:
- What are useful concepts to work on next?
- Has the student mastered the goal material?
- How much instructional gain will this material yield for the student?
- How much will this piece of material improve our understanding of the student’s knowledge state and therefore what she should focus on next?
All of these questions can only be answered when equipped with an accurate understanding of the student’s knowledge state. As an example, consider Alice again. If we had a bare-bones proficiency model that did not propagate her mastery to “Add whole numbers”, we might consider this a valid concept to recommend material from. This could lead to a frustrating experience, and the feeling that Knewton was broken: “Why am I being recommended this basic stuff that I clearly already know?!”
At the end of the day, it’s user experience stories like this that motivate much of the complex data analysis and mathematical modeling we do at Knewton. And it’s what motivates us to keep pushing the limit on how we can best improve student learning outcomes.
*There are other temporal effects that kick-in if you’ve seen the same question more than once recently.
** There is a whole other layer of complexity in our Proficiency Model that we’ve glossed over. We actually estimate a student’s proficiency and a measure of our confidence in that estimate. These are the proficiency mean and variance, and can be combined to obtain confidence intervals, for example. For the purposes of this blog post, we are only considering the propagation of proficiency means.
This post was written by Michael Binger, a data scientist at Knewton.