Knewton VP of Research David Kuntz uses the English language to explain the numbers behind his Science.
Computer-adaptive tests (CATs) come in all shapes and sizes, and you meet them everywhere: getting your driver’s license, achieving an IT certification like MCSE or CCNA, or applying for admission to business school.Â Even the TV show “Are You Smarter Than A Fifth-Grader?” can be considered an adaptive assessment, albeit self-adaptive, not computer-based.Â But they all have something in common: Instead of going through a set of questions in some predetermined order, the questions you face next are selected (by you or someone else) in some manner based on how you have already responded to the questions you’ve seen.
Underlying every CAT is an algorithm that selects the next item to display.Â There are many such algorithms in use today. Some base selection on whether you answer a question correctly. Others adapt based on which specific incorrect response is selected.Â Still others look at overall performance on groups of questions.Â Some very advanced CATs don’t look like tests at all, and present tasks or activities based on what you actions you took in the preceding activities.
The GMAT has a blueprint—a set of specifications (difficulty, question type, content area, etc.) – that defines the structure and content of the test. Each question has statistical characteristics (e.g., that the question is hard or easy) and content characteristics (e.g., that the question is a Geometry item dealing with isosceles triangles). The algorithm looks at your performance on the questions you have already answered and the characteristics of each question remaining in the pool and then selects for you the question that simultaneously best satisfies the blueprint and provides the most statistical information it can, to generate the best estimate of your ability. Since people at all ability levels take the test, a large quantity of questions are needed in order to be able to provide accurate assessment for test-takers. All of these questions need to be carefully constructed, reviewed, and statistically aligned so that they contribute meaningfully to your ability estimate.
So what makes a good CAT?Â In addition to a rich pool of questions of varying difficulty, it requires a robust algorithm to estimate your ability, a fast and reliable mechanism to identify the best question for you to see next, and a powerful scoring algorithm that translates the ability estimate into something meaningful. It also needs a whole series of mechanisms to ensure that you don’t see the same questions over again when you take the test more than once, and don’t see too much of one content category or another.
As you can imagine, CATs are tricky to build and maintain. One of the great things about the Knewton CAT is that it was developed by the people who actually made the GMAT (and GRE) CATs.Â So all of the algorithms that select questions, estimate your ability, and score your test are as close to the real thing as you can get without actually sending your scores to a business school.
We’ll talk about how scoring works for GMAT in another post.Â Until then, do your homework.