From the Board of Trustees: ABR Oral Exam Scoring Processes Reduce Subjectivity
By ABR Associate Executive Directors Mary S. Newell, MD, (Diagnostic Radiology), James B. Spies, MD, MPH, (Interventional Radiology), Geoffrey S. Ibbott, PhD, (Medical Physics), and Michael Yunes, MD, (Radiation Oncology); and ABR Associate Director of Psychometrics Ben Babcock, PhD
December 2025;18(6):4

Scoring oral exams is unavoidably subjective at some level. To mitigate the impact of that subjectivity on individual candidates and the validity of the overall assessment process, the ABR’s psychometric functions focus on standardization wherever possible.
One of the ways we enhance standardization is by having exam committees determine specific performance elements (essential points an examinee should mention to demonstrate full understanding) for each case. A second method is by holding category meetings before and after the exam periods to establish common ground for the examiners for the subspecialty case set in that portion of the exam.
There are separate content categories in each radiology discipline. Each category is the subject of a videoconference session (20 to 30 minutes in duration) between one examiner and one candidate. The examiner determines a composite score for the category, which is an average of the individual scores for the cases discussed during the session. Examiners assign case scores as whole numbers between 68 and 72. A passing score would be at or above the standard threshold (70). This reinforces that passing the exam does not require perfection, but it does require demonstrating the competence to function independently, effectively, and safely in service to patients. To avoid crosstalk bias, the examiner commits to the score before seeing the scores of the other examiners who met with the same candidate. In medical physics, each of a candidate’s five examiners asks one question from each of five categories. The composite score for each category is the average of the scores from the five examiners.
Across all four disciplines (including diagnostic radiology, with the upcoming return to the DR oral exam in 2028), 68 corresponds to a performance that is “absolutely unsatisfactory,” 69 is “marginally unsatisfactory,” 70 is “satisfactory,” 71 is “good,” and 72 is “outstanding.” Because the score is based on multiple cases within a category, a poor performance on one case can usually be offset by a better performance on one or more other cases; a candidate rarely fails because of errors in discussing a single case.
Scoring details vary depending on the discipline (and the specific case being discussed). In radiation oncology, a passing score (70 or better) requires some combination of clinical and imaging assessment, consideration of the appropriateness of treatment options and planning, and recognition of relevant safety concerns (including organ-at-risk). In interventional radiology, cases require reasonable synthesis of clinical and imaging findings, appropriate treatment planning (including risks vs benefits), procedural technique, and the avoidance and treatment of potential complications. In medical physics, a passing score requires the candidate to demonstrate knowledge of clinical medical physics methods, imaging and treatment procedures, equipment performance, radiation protection standards, QA procedures, and problem-solving skills. For diagnostic radiology, most cases initially focus on observation (identification of the finding, its relevant features, and pertinent negatives). This is followed by synthesis (describing how the specific features of the finding[s] allow one to formulate a differential diagnosis, and proposing the most likely diagnosis), and subsequently by management (e.g., correctly identifying the need for additional imaging, biopsy, appropriate referral, etc.). Candidates will vary in the degree to which they may need prompting or redirection, but most of the exam experience attempts to replicate the communication and clinical skills that are expected of individuals who have successfully completed a rigorous training program.
Many new examiners are surprised to learn that the oral exams are far more consistent and objective than they might seem on the surface. Multiple data points for each category and independent determinations made by the examiners in multiple distinct sessions lead to results that typically vary little across a candidate’s aggregate performance. This is not unexpected; one would anticipate that a candidate whose abilities to apply a depth of knowledge and sound reasoning in one area would be able to demonstrate those same skills in another portion of the domain within their discipline.
