Student Latent State Estimation with the Kalman Filter

The Kalman filter is an algorithm that estimates the values of unknown, or latent, variables from a series of noisy measurements taken over time. The Kalman filter has numerous applications in finance and different areas of engineering, and in this blog post, we will show how the Kalman filter can apply to student learning. We will use a Kalman filter to infer an approximation to a student’s ability given their response times to a set of questions.

To fully understand the concepts below, you’ll need a background in basic probability and statistics.

Model

To begin, we are going to describe our inference problem and a model of the system – a set of assumptions of how student learning works.

Given a series of scalar measurements:

Z=\{z_1,z_2,...,z_n\}

where z_i denotes the natural logarithm of the time it took our student to answer question i,

We want to infer the scalar latent variables:

X=\{x_1,x_2,...,x_n\}

where the student has ability x_i at the time question i is answered.

We model the change in student ability over time as a Gaussian random walk, meaning that the current ability value is based on the previous ability value, plus some Gaussian noise:

x_k=x_{k-1}+w_k, where w_k is drawn from N(0, \tau \sigma^2_x), where \tau is the time the student spent between questions, and \sigma^2_x is a hyperparameter that corresponds to the variance for this latent process (1).

Having the variance of the noise increase linearly with the time difference makes our equation consistent with a continuous Gaussian random walk, and is consistent with our intuition that a student’s ability needs time to change. In other words, a student is unlikely to experience significant change in ability if they’re between questions in a quiz, but if it’s been a day since they’ve answered a question, they’re more likely to have taken the time to learn more about the concept, or, conversely, they might have forgotten the material. Because the latent state variance is time-dependent, our filter is technically called a hybrid Kalman filter, since it assumes a continuous-time model for a discrete set of observations.

We don’t assume that the student ability x_k accounts for all the variability in the log of the student response time, z_k. For example, it’s possible that the student is familiar with this particular problem or is distracted by her environment. Therefore, we say that the observed times are likewise corrupted by some Gaussian noise, v_k:

z_k=x_k+v_k, where v_k is drawn from N(0, \sigma^2_z), where \sigma^2_z is a hyperparameter that corresponds to the variance of the Gaussian noise (2).


Kalman Filter blog post (v2)

The resulting model is pictured by the diagram above: the ability of the student at the previous question x_{k-1} determines the log response time z_{k-1}. The ability of the student at current question x_k is determined by the ability at the previous question x_{k-1}, and determines the current log response time, z_k.


Inference

The Kalman filter is an algorithm for estimating each x_k a posteriori — that is, it computes an estimate for each x_k given observations Z = \{z_1, z_2,...,z_k\}. It does this recursively, which means that it only needs an estimate of x_{k-1}, along with observation z_k, to output an estimate for x_k.

To make a new estimate, we first need to compute two intermediate pieces of information: a prior distribution and a likelihood distribution.

A note on syntax:

z_{1:k-1} denotes our observations z_1 through z_{k-1}, and x_{k-1|k-1} represents our estimate of ability at the k-1th question given observations z_{1:k-1}; likewise \sigma^2_{k-1|k-1} represents our estimate of the variance given observations z_{1:k-1}.

f denotes the Gaussian probability density function (pdf) with mean \mu  and variance \sigma^2:

f(x;\mu,\sigma^2) = \frac{1}{\sigma \sqrt{2 \pi}}e^{-\frac{(x-\mu)^2}{2 \sigma^2}}

Calculating our prior distribution

Our prior term represents the knowledge we have of current latent state x_k having seen everything but our current observation z_{k}.

To calculate our prior, we start off with an estimate of x_{k-1}, represented as a Gaussian distribution with mean x_{k-1|k-1} and variance \sigma^2_{k-1|k-1}:

p(x_{k-1} |z_{1:k-1}) = f(x_{k-1};x_{k-1|k-1}, \sigma^2_{k-1|k-1}) (3)

From equation 1, we see that x_k is simply the addition of two independent random Gaussian random variables, x_{k-1} and w_k.

From probability theory, we know that the probability density of the addition of two independent random variables can be expressed as a convolution of the two composite probability densities. It happens that the convolution of two Gaussians is also a Gaussian:

p(x_k|z_{1:k-1}) = \displaystyle\int p(x_{k-1}|z_{1:k-1}) p(x_k|x_{k-1}) dx_{k-1} = f(x_k; x_{k|k-1}, \sigma^2_{k|k-1}), where

x_{k|k-1}=x_{k-1|k-1} and

\sigma^2_{k|k-1}=\sigma^2_{k-1|k-1}+ \tau \sigma^2_x  (4)

We call p(x_k|z_{1:k-1}) the prior knowledge we have about x_k. The next step is to look at the current observation, z_k and see what information it adds.

Calculating our likelihood distribution

Our likelihood term represents the information our current observation z_k gives us about our latent state x_k.

From equation 2, we see that likelihood of our observation z_k given our hidden variable x_k is simply a Gaussian centered at x_k. This becomes our likelihood term:

p(z_k|x_k) = f(z_k; x_k, \sigma^2_z) (5)

Combining prior and likelihood

The Kalman filter combines the prior knowledge we have about x_k and our likelihood term in accordance with Bayes’ rule, by multiplying the prior term with the likelihood term. We call this resulting distribution the posterior, or our estimate of x_k given all the information we have.

Luckily, the multiplication of two Gaussians is still a Gaussian, although unnormalized:

p(x_k|z_{1:k}) = \frac{1}{Z} p(x_k|z_{1:k-1})*p(z_k|x_k) = f(x_k;x_{k|k}, \sigma^2_{k|k}), where Z is a normalizing constant, and where:

x_{k|k}=x_{k|k-1}+C_k(z_k-x_{k|k-1})

C_k=\frac{\sigma^2_{k|k-1}}{\sigma^2_{k|k-1} + \sigma^2_z}

\sigma^2_{k|k} = (\frac{1}{\sigma^2_{k|k-1}} + \frac{1}{\sigma^2_z})^{-1} (6)

To summarize, given a Gaussian posterior distribution for x_{k-1} (Equation 3) and a new observation z_k, the Kalman filter estimates a new Gaussian posterior for x_k (Equation 6). By updating the Kalman filter as we receive new observations, we can obtain fast, real-time estimates of our latent state.

Open Source Code

An open source implementation of this hybrid Kalman filter algorithm is on Knewton’s GitHub:

https://github.com/Knewton/Kalman

Authors

Sophie Chou is a senior at Columbia University majoring in Computer Science. She’s interested in Machine Learning, Natural Language Processing, and becoming a better teacher. You can follow her on twitter @mpetitchou.

Andersen Chen is a senior at Brown University, majoring in Mathematics and Computer Science. He’s interested in data science, startups, and machine learning.


What's this? You're reading N choose K, the Knewton tech blog. We're crafting the Knewton Adaptive Learning Platform that uses data from millions of students to continuously personalize the presentation of educational content according to learners' needs. Sound interesting? We're hiring.