David Kuntz is Vice President, Research at Knewton, where he builds the CATs for its online GMAT course.
I’ve received a number of inquiries from the community about the GMAT algorithm, so I thought it best to reply in article form. Here are some frequently asked questions about computer-adaptive tests (CATs).
1. What’s an algorithm?
An algorithm, generally, is a usually efficient set of well-defined steps that are followed to solve some pre-defined problem. In the case of a CAT algorithm, the problem is to reliably and efficiently estimate a student’s ability in a reasonable amount of time. Some CAT algorithms seek to solve this problem by selecting one question at a time, each subsequent question selected based on all of the student’s prior responses. Other algorithms look only at the most recently-answered question. Still others evaluate responses to specific groups of questions.
CAT algorithms also vary with regard to the explicit criteria they use to select the next question (or sets of questions) to administer. Some try to minimize total measurement error. Others try to maximize the precision and accuracy of measurement for each question administered. Still others try to select questions that will most refine the current ability estimate. As a consequence, CAT algorithms can vary greatly from one to another, depending on the specific implementation of the algorithm, and the intent of the algorithm developers.
2. Why does the GMAT use an algorithm when the linear LSAT seems to be a pretty decent gauge of proficiency?
One of the common goals in using a CAT algorithm is to reduce the number of questions a student needs to answer in order to establish, to a specified level of reliability, an estimate of the student’s ability. CATs are often more efficient than linear tests, and so fewer questions are needed to reach a desired level of reliability. The LSAT needs over 100 items to reach that level, while the GMAT needs fewer than 80 to reach a comparable level.
3. Is the entire GMAT adaptive?
Almost all large-scale standardized tests contain some number of “experimental” or “pretest” questions that are administered to the student but do not count toward the student’s final score. This is simply a way for the test makers to gather data on the questions, in order to determine how difficult they are and how well they distinguish between students at different ability levels. They also use the data collected to identify bad questions, so that they can eliminate or fix them before they count.
Some tests, like the LSAT, include all of the pretest questions in a single section. Others, like the GMAT, intermingle the pretest questions with the operational ones. Which section is the pretest section, and which questions are the pretest questions, is usually a well-guarded secret. It is generally bad strategy to spend time trying to guess whether a given question is operational or not. The price of guessing incorrectly is just too high.
4. How does the GMAT select which questions I get?
CATs like the GMAT have a blueprint — a set of specifications (difficulty, question type, content area, etc.) that define which questions you see. At the same time, each question has certain statistical characteristics that the algorithm uses, based on your response, to estimate your quantitative or verbal ability. The algorithm looks at your performance on the questions you have already answered and the characteristics of each question remaining in the pool and then selects for you the question that simultaneously best satisfies the blueprint and provides the most statistical information it can, to generate the best estimate of your ability.