Inter-rater variability as mutual disagreement: identifying raters' divergent points of view

Andrea Gingerich; Susan E. Ramlo; Cees P. M. van der Vleuten; Kevin W. Eva; Glenn Regehr

doi:10.1007/s10459-016-9711-8

Inter-rater variability as mutual disagreement: identifying raters' divergent points of view

Andrea Gingerich^*, Susan E. Ramlo, Cees P. M. van der Vleuten, Kevin W. Eva, Glenn Regehr

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

79 Downloads (Pure)

Abstract

Whenever multiple observers provide ratings, even of the same performance, inter-rater variation is prevalent. The resulting 'idiosyncratic rater variance' is considered to be unusable error of measurement in psychometric models and is a threat to the defensibility of our assessments. Prior studies of inter-rater variation in clinical assessments have used open response formats to gather raters' comments and justifications. This design choice allows participants to use idiosyncratic response styles that could result in a distorted representation of the underlying rater cognition and skew subsequent analyses. In this study we explored rater variability using the structured response format of Q methodology. Physician raters viewed video-recorded clinical performances and provided Mini Clinical Evaluation Exercise (Mini-CEX) assessment ratings through a web-based system. They then shared their assessment impressions by sorting statements that described the most salient aspects of the clinical performance onto a forced quasi-normal distribution ranging from "most consistent with my impression" to "most contrary to my impression". Analysis of the resulting Q-sorts revealed distinct points of view for each performance shared by multiple physicians. The points of view corresponded with the ratings physicians assigned to the performance. Each point of view emphasized different aspects of the performance with either rapport-building and/or medical expertise skills being most salient. It was rare for the points of view to diverge based on disagreements regarding the interpretation of a specific aspect of the performance. As a result, physicians' divergent points of view on a given clinical performance cannot be easily reconciled into a single coherent assessment judgment that is impacted by measurement error. If inter-rater variability does not wholly reflect error of measurement, it is problematic for our current measurement models and poses challenges for how we are to adequately analyze performance assessment ratings.

Original language	English
Pages (from-to)	819-838
Number of pages	20
Journal	Advances in Health Sciences Education
Volume	22
Issue number	4
DOIs	https://doi.org/10.1007/s10459-016-9711-8
Publication status	Published - Oct 2017

Keywords

Inter-rater variability
Mini-CEX
Q methodology
Rater-based assessment
Rater cognition
Workplace-based assessment
PERFORMANCE RATINGS
VARIANCE-COMPONENTS
CLINICAL COMPETENCE
SOCIAL JUDGMENTS
PERSON MODELS
BLACK-BOX
MINI-CEX
COGNITION
GENERALIZABILITY
RELIABILITY

Access to Document

10.1007/s10459-016-9711-8

Full text Final published version, 608 KBLicence: Taverne

Cite this

@article{bf2ff1b575ab49c68552401d6d181caa,

title = "Inter-rater variability as mutual disagreement: identifying raters' divergent points of view",

abstract = "Whenever multiple observers provide ratings, even of the same performance, inter-rater variation is prevalent. The resulting 'idiosyncratic rater variance' is considered to be unusable error of measurement in psychometric models and is a threat to the defensibility of our assessments. Prior studies of inter-rater variation in clinical assessments have used open response formats to gather raters' comments and justifications. This design choice allows participants to use idiosyncratic response styles that could result in a distorted representation of the underlying rater cognition and skew subsequent analyses. In this study we explored rater variability using the structured response format of Q methodology. Physician raters viewed video-recorded clinical performances and provided Mini Clinical Evaluation Exercise (Mini-CEX) assessment ratings through a web-based system. They then shared their assessment impressions by sorting statements that described the most salient aspects of the clinical performance onto a forced quasi-normal distribution ranging from {"}most consistent with my impression{"} to {"}most contrary to my impression{"}. Analysis of the resulting Q-sorts revealed distinct points of view for each performance shared by multiple physicians. The points of view corresponded with the ratings physicians assigned to the performance. Each point of view emphasized different aspects of the performance with either rapport-building and/or medical expertise skills being most salient. It was rare for the points of view to diverge based on disagreements regarding the interpretation of a specific aspect of the performance. As a result, physicians' divergent points of view on a given clinical performance cannot be easily reconciled into a single coherent assessment judgment that is impacted by measurement error. If inter-rater variability does not wholly reflect error of measurement, it is problematic for our current measurement models and poses challenges for how we are to adequately analyze performance assessment ratings.",

keywords = "Inter-rater variability, Mini-CEX, Q methodology, Rater-based assessment, Rater cognition, Workplace-based assessment, PERFORMANCE RATINGS, VARIANCE-COMPONENTS, CLINICAL COMPETENCE, SOCIAL JUDGMENTS, PERSON MODELS, BLACK-BOX, MINI-CEX, COGNITION, GENERALIZABILITY, RELIABILITY",

author = "Andrea Gingerich and Ramlo, {Susan E.} and {van der Vleuten}, {Cees P. M.} and Eva, {Kevin W.} and Glenn Regehr",

year = "2017",

month = oct,

doi = "10.1007/s10459-016-9711-8",

language = "English",

volume = "22",

pages = "819--838",

journal = "Advances in Health Sciences Education",

issn = "1382-4996",

publisher = "Springer, Cham",

number = "4",

}

TY - JOUR

T1 - Inter-rater variability as mutual disagreement

T2 - identifying raters' divergent points of view

AU - Gingerich, Andrea

AU - Ramlo, Susan E.

AU - van der Vleuten, Cees P. M.

AU - Eva, Kevin W.

AU - Regehr, Glenn

PY - 2017/10

Y1 - 2017/10

N2 - Whenever multiple observers provide ratings, even of the same performance, inter-rater variation is prevalent. The resulting 'idiosyncratic rater variance' is considered to be unusable error of measurement in psychometric models and is a threat to the defensibility of our assessments. Prior studies of inter-rater variation in clinical assessments have used open response formats to gather raters' comments and justifications. This design choice allows participants to use idiosyncratic response styles that could result in a distorted representation of the underlying rater cognition and skew subsequent analyses. In this study we explored rater variability using the structured response format of Q methodology. Physician raters viewed video-recorded clinical performances and provided Mini Clinical Evaluation Exercise (Mini-CEX) assessment ratings through a web-based system. They then shared their assessment impressions by sorting statements that described the most salient aspects of the clinical performance onto a forced quasi-normal distribution ranging from "most consistent with my impression" to "most contrary to my impression". Analysis of the resulting Q-sorts revealed distinct points of view for each performance shared by multiple physicians. The points of view corresponded with the ratings physicians assigned to the performance. Each point of view emphasized different aspects of the performance with either rapport-building and/or medical expertise skills being most salient. It was rare for the points of view to diverge based on disagreements regarding the interpretation of a specific aspect of the performance. As a result, physicians' divergent points of view on a given clinical performance cannot be easily reconciled into a single coherent assessment judgment that is impacted by measurement error. If inter-rater variability does not wholly reflect error of measurement, it is problematic for our current measurement models and poses challenges for how we are to adequately analyze performance assessment ratings.

AB - Whenever multiple observers provide ratings, even of the same performance, inter-rater variation is prevalent. The resulting 'idiosyncratic rater variance' is considered to be unusable error of measurement in psychometric models and is a threat to the defensibility of our assessments. Prior studies of inter-rater variation in clinical assessments have used open response formats to gather raters' comments and justifications. This design choice allows participants to use idiosyncratic response styles that could result in a distorted representation of the underlying rater cognition and skew subsequent analyses. In this study we explored rater variability using the structured response format of Q methodology. Physician raters viewed video-recorded clinical performances and provided Mini Clinical Evaluation Exercise (Mini-CEX) assessment ratings through a web-based system. They then shared their assessment impressions by sorting statements that described the most salient aspects of the clinical performance onto a forced quasi-normal distribution ranging from "most consistent with my impression" to "most contrary to my impression". Analysis of the resulting Q-sorts revealed distinct points of view for each performance shared by multiple physicians. The points of view corresponded with the ratings physicians assigned to the performance. Each point of view emphasized different aspects of the performance with either rapport-building and/or medical expertise skills being most salient. It was rare for the points of view to diverge based on disagreements regarding the interpretation of a specific aspect of the performance. As a result, physicians' divergent points of view on a given clinical performance cannot be easily reconciled into a single coherent assessment judgment that is impacted by measurement error. If inter-rater variability does not wholly reflect error of measurement, it is problematic for our current measurement models and poses challenges for how we are to adequately analyze performance assessment ratings.

KW - Inter-rater variability

KW - Mini-CEX

KW - Q methodology

KW - Rater-based assessment

KW - Rater cognition

KW - Workplace-based assessment

KW - PERFORMANCE RATINGS

KW - VARIANCE-COMPONENTS

KW - CLINICAL COMPETENCE

KW - SOCIAL JUDGMENTS

KW - PERSON MODELS

KW - BLACK-BOX

KW - MINI-CEX

KW - COGNITION

KW - GENERALIZABILITY

KW - RELIABILITY

U2 - 10.1007/s10459-016-9711-8

DO - 10.1007/s10459-016-9711-8

M3 - Article

C2 - 27651046

SN - 1382-4996

VL - 22

SP - 819

EP - 838

JO - Advances in Health Sciences Education

JF - Advances in Health Sciences Education

IS - 4

ER -