How well do workplace-based assessments support summative entrustment decisions? A multi-institutional generalisability study

Michael S. Ryan; Katherine A. Gielissen; Dongho Shin; Robert A. Perera; Maryellen Gusic; Gary Ferenchick; Allison Ownby; William B. Cutrer; Vivian Obeso; Sally A. Santen

doi:10.1111/medu.15291

How well do workplace-based assessments support summative entrustment decisions? A multi-institutional generalisability study

Michael S. Ryan^*, Katherine A. Gielissen, Dongho Shin, Robert A. Perera, Maryellen Gusic, Gary Ferenchick, Allison Ownby, William B. Cutrer, Vivian Obeso, Sally A. Santen

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

Background: Assessment of the Core Entrustable Professional Activities for Entering Residency requires direct observation through workplace-based assessments (WBAs). Single-institution studies have demonstrated mixed findings regarding the reliability of WBAs developed to measure student progression towards entrustment. Factors such as faculty development, rater engagement and scale selection have been suggested to improve reliability. The purpose of this investigation was to conduct a multi-institutional generalisability study to determine the influence of specific factors on reliability of WBAs.Methods: The authors analysed WBA data obtained for clerkship-level students across seven institutions from 2018 to 2020. Institutions implemented a variety of strategies including selection of designated assessors, altered scales and different EPAs. Data were aggregated by these factors. Generalisability theory was then used to examine the internal structure validity evidence of the data. An unbalanced cross-classified random-effects model was used to decompose variance components. A phi coefficient of >0.7 was used as threshold for acceptable reliability.Results: Data from 53 565 WBAs were analysed, and a total of 77 generalisability studies were performed. Most data came from EPAs 1 (n = 17 118, 32%) 2 (n = 10 237, 19.1%), and 6 (n = 6000, 18.5%). Low variance attributed to the learner (<10%) was found for most (59/77, 76%) analyses, resulting in a relatively large number of observations required for reasonable reliability (range = 3 to >560, median = 60). Factors such as DA, scale or EPA were not consistently associated with improved reliability.Conclusion: The results from this study describe relatively low reliability in the WBAs obtained across seven sites. Generalisability for these instruments may be less dependent on factors such as faculty development, rater engagement or scale selection. When used for formative feedback, data from these instruments may be useful. However, such instruments do not consistently provide reasonable reliability to justify their use in high-stakes summative entrustment decisions.

Original language	English
Number of pages	13
Journal	Medical Education
DOIs	https://doi.org/10.1111/medu.15291
Publication status	E-pub ahead of print - 1 Jan 2024

Keywords

ENTRUSTABLE PROFESSIONAL ACTIVITIES
UNDERGRADUATE MEDICAL-EDUCATION
TRAINEES CLINICAL SKILLS
CORE EPAS
VALIDITY EVIDENCE
COMPETENCE
FACULTY
SCALES
OTTAWA

Access to Document

10.1111/medu.15291Licence: CC BY-NC

Cite this

@article{3d221e0d6d554588b6134fa18aa3be08,

title = "How well do workplace-based assessments support summative entrustment decisions? A multi-institutional generalisability study",

abstract = "Background: Assessment of the Core Entrustable Professional Activities for Entering Residency requires direct observation through workplace-based assessments (WBAs). Single-institution studies have demonstrated mixed findings regarding the reliability of WBAs developed to measure student progression towards entrustment. Factors such as faculty development, rater engagement and scale selection have been suggested to improve reliability. The purpose of this investigation was to conduct a multi-institutional generalisability study to determine the influence of specific factors on reliability of WBAs.Methods: The authors analysed WBA data obtained for clerkship-level students across seven institutions from 2018 to 2020. Institutions implemented a variety of strategies including selection of designated assessors, altered scales and different EPAs. Data were aggregated by these factors. Generalisability theory was then used to examine the internal structure validity evidence of the data. An unbalanced cross-classified random-effects model was used to decompose variance components. A phi coefficient of >0.7 was used as threshold for acceptable reliability.Results: Data from 53 565 WBAs were analysed, and a total of 77 generalisability studies were performed. Most data came from EPAs 1 (n = 17 118, 32%) 2 (n = 10 237, 19.1%), and 6 (n = 6000, 18.5%). Low variance attributed to the learner (<10%) was found for most (59/77, 76%) analyses, resulting in a relatively large number of observations required for reasonable reliability (range = 3 to >560, median = 60). Factors such as DA, scale or EPA were not consistently associated with improved reliability.Conclusion: The results from this study describe relatively low reliability in the WBAs obtained across seven sites. Generalisability for these instruments may be less dependent on factors such as faculty development, rater engagement or scale selection. When used for formative feedback, data from these instruments may be useful. However, such instruments do not consistently provide reasonable reliability to justify their use in high-stakes summative entrustment decisions.",

keywords = "ENTRUSTABLE PROFESSIONAL ACTIVITIES, UNDERGRADUATE MEDICAL-EDUCATION, TRAINEES CLINICAL SKILLS, CORE EPAS, VALIDITY EVIDENCE, COMPETENCE, FACULTY, SCALES, OTTAWA",

author = "Ryan, {Michael S.} and Gielissen, {Katherine A.} and Dongho Shin and Perera, {Robert A.} and Maryellen Gusic and Gary Ferenchick and Allison Ownby and Cutrer, {William B.} and Vivian Obeso and Santen, {Sally A.}",

year = "2024",

month = jan,

day = "1",

doi = "10.1111/medu.15291",

language = "English",

journal = "Medical Education",

issn = "0308-0110",

publisher = "Wiley",

}

TY - JOUR

T1 - How well do workplace-based assessments support summative entrustment decisions? A multi-institutional generalisability study

AU - Ryan, Michael S.

AU - Gielissen, Katherine A.

AU - Shin, Dongho

AU - Perera, Robert A.

AU - Gusic, Maryellen

AU - Ferenchick, Gary

AU - Ownby, Allison

AU - Cutrer, William B.

AU - Obeso, Vivian

AU - Santen, Sally A.

PY - 2024/1/1

Y1 - 2024/1/1

N2 - Background: Assessment of the Core Entrustable Professional Activities for Entering Residency requires direct observation through workplace-based assessments (WBAs). Single-institution studies have demonstrated mixed findings regarding the reliability of WBAs developed to measure student progression towards entrustment. Factors such as faculty development, rater engagement and scale selection have been suggested to improve reliability. The purpose of this investigation was to conduct a multi-institutional generalisability study to determine the influence of specific factors on reliability of WBAs.Methods: The authors analysed WBA data obtained for clerkship-level students across seven institutions from 2018 to 2020. Institutions implemented a variety of strategies including selection of designated assessors, altered scales and different EPAs. Data were aggregated by these factors. Generalisability theory was then used to examine the internal structure validity evidence of the data. An unbalanced cross-classified random-effects model was used to decompose variance components. A phi coefficient of >0.7 was used as threshold for acceptable reliability.Results: Data from 53 565 WBAs were analysed, and a total of 77 generalisability studies were performed. Most data came from EPAs 1 (n = 17 118, 32%) 2 (n = 10 237, 19.1%), and 6 (n = 6000, 18.5%). Low variance attributed to the learner (<10%) was found for most (59/77, 76%) analyses, resulting in a relatively large number of observations required for reasonable reliability (range = 3 to >560, median = 60). Factors such as DA, scale or EPA were not consistently associated with improved reliability.Conclusion: The results from this study describe relatively low reliability in the WBAs obtained across seven sites. Generalisability for these instruments may be less dependent on factors such as faculty development, rater engagement or scale selection. When used for formative feedback, data from these instruments may be useful. However, such instruments do not consistently provide reasonable reliability to justify their use in high-stakes summative entrustment decisions.

AB - Background: Assessment of the Core Entrustable Professional Activities for Entering Residency requires direct observation through workplace-based assessments (WBAs). Single-institution studies have demonstrated mixed findings regarding the reliability of WBAs developed to measure student progression towards entrustment. Factors such as faculty development, rater engagement and scale selection have been suggested to improve reliability. The purpose of this investigation was to conduct a multi-institutional generalisability study to determine the influence of specific factors on reliability of WBAs.Methods: The authors analysed WBA data obtained for clerkship-level students across seven institutions from 2018 to 2020. Institutions implemented a variety of strategies including selection of designated assessors, altered scales and different EPAs. Data were aggregated by these factors. Generalisability theory was then used to examine the internal structure validity evidence of the data. An unbalanced cross-classified random-effects model was used to decompose variance components. A phi coefficient of >0.7 was used as threshold for acceptable reliability.Results: Data from 53 565 WBAs were analysed, and a total of 77 generalisability studies were performed. Most data came from EPAs 1 (n = 17 118, 32%) 2 (n = 10 237, 19.1%), and 6 (n = 6000, 18.5%). Low variance attributed to the learner (<10%) was found for most (59/77, 76%) analyses, resulting in a relatively large number of observations required for reasonable reliability (range = 3 to >560, median = 60). Factors such as DA, scale or EPA were not consistently associated with improved reliability.Conclusion: The results from this study describe relatively low reliability in the WBAs obtained across seven sites. Generalisability for these instruments may be less dependent on factors such as faculty development, rater engagement or scale selection. When used for formative feedback, data from these instruments may be useful. However, such instruments do not consistently provide reasonable reliability to justify their use in high-stakes summative entrustment decisions.

KW - ENTRUSTABLE PROFESSIONAL ACTIVITIES

KW - UNDERGRADUATE MEDICAL-EDUCATION

KW - TRAINEES CLINICAL SKILLS

KW - CORE EPAS

KW - VALIDITY EVIDENCE

KW - COMPETENCE

KW - FACULTY

KW - SCALES

KW - OTTAWA

U2 - 10.1111/medu.15291

DO - 10.1111/medu.15291

M3 - Article

SN - 0308-0110

JO - Medical Education

JF - Medical Education

ER -