Criterion-Referenced Scoring Compares Performance to a Standard
The ABR uses criterion-referenced scoring on all its computer-based exams, as do many healthcare certification and licensure testing programs.
What is it?
There are generally two types of scoring for standardized tests: norm-referenced and criterion-referenced. You may be familiar with norm-referenced scoring from widely used exams for college entrance (e.g., SAT, GRE, etc.) or IQ tests (e.g., Wechsler Intelligence Scale for Children [WISC]). This type of scoring shows how well a test-taker performed in relationship to other test-takers. Norm-referenced scoring allows us to answer, “How did this test-taker do compared to others?” through reporting scores as percentiles, stanines, or normal curve equivalents. Norms are established so that 25% of test-takers will score in the bottom 25th percentile, 50% below the 50th percentile, 75% below the 75th percentile, and so on (see Figure 1).
In contrast, criterion-referenced scoring allows us to ask, “How did this test-taker perform relative to the standard set for the exam?” by reporting scores that show information about the level of achievement that a test-taker has demonstrated on the test. Criterion-referenced scoring bases a test-taker’s score solely on her or his knowledge of the content, without reference to other candidates. A very common criterion-referenced test is a test to acquire a driver’s license. Achievement on the exam indicates a level of mastery of driving knowledge, which is then used to determine if the test-taker should “Pass” and receive a license or “Fail” and not receive a license. The would-be driver must meet or exceed a level of knowledge, specific standards, or criteria that is predetermined before the test is taken. The score indicates how much the test-taker knows in relation to the criteria – “Does this person know enough to drive safely?” – rather than how well this person drives compared with others.
Why does it matter?
Understanding the differences between norm-referenced and criterion-referenced scoring is important because it clarifies how a test score can be interpreted as well as the context in which it was derived. One very important difference between these two scoring methods is what results are possible from each. Because norm-referenced scoring ranks test-takers relative to one another, at least 50% of test takers will be below the 50th percentile. There is no true “Pass” or “Fail” for norm-referenced exams, though percentile rankings can have consequences. For example, a higher percentile ranking on a college entrance exam is likely to boost one’s chances of being accepted into a more prestigious school. With limited admissions spots, a score in the 95th percentile would be highly desirable. Refer to Figure 1 for a visual reference of the number of individuals who might be in the 95th percentile.
Criterion-referenced scoring sets the passing score without regard to how well test-takers performed as a group or cohort. Passing is based on whether or not one’s score is at or above the criterion. Because scoring is not tied to the performance of other test-takers, the number of people who may pass is limited only by the number of people who meet the standard. A normal distribution of scores is neither expected nor relevant. In Figure 2, the number of individuals meeting the criterion of 72% of items correct is substantially larger than those in the 95th percentile (shown in Figure 1). If 72% was the “cut score” established before the exam was administered, then everyone who attained that score would pass. (For information on how cut scores are set for ABR exams, see Volunteer Gets Hands-On Knowledge of Angoff Process.)
In summary, norm-referenced scoring is focused on how well a test-taker performs relative to all other test-takers, rather than on passing or failing. Criterion-referenced, on the other hand, is exclusively focused on test takers passing or failing; the performance of others has no bearing on an individual’s score. If test-takers meet the standard, they will pass the exam, regardless of how well they performed compared with their peers.