PurposeCompetency-based anesthesia training programs require robust assessment of trainee performance and commonly combine different types of workplace-based assessment (WBA) covering multiple facets of practice. This study measured the reliability of WBAs in a large existing database and explored how they could be combined to optimize reliability for assessment decisions.MethodsWe used generalizability theory to measure the composite reliability of four different types of WBAs used by the Australian and New Zealand College of Anaesthetists: mini-Clinical Evaluation Exercise (mini-CEX), direct observation of procedural skills (DOPS), case-based discussion (CbD), and multi-source feedback (MSF). We then modified the number and weighting of WBA combinations to optimize reliability with fewer assessments.ResultsWe analyzed 67,405 assessments from 1,837 trainees and 4,145 assessors. We assumed acceptable reliability for interim (intermediate stakes) and final (high stakes) decisions of 0.7 and 0.8, respectively. Depending on the combination of WBA types, 12 assessments allowed the 0.7 threshold to be reached where one assessment of any type has the same weighting, while 20 were required for reliability to reach 0.8. If the weighting of the assessments is optimized, acceptable reliability for interim and final decisions is possible with nine (e.g., two DOPS, three CbD, two mini-CEX, two MSF) and 15 (e.g., two DOPS, eight CbD, three mini-CEX, two MSF) assessments respectively.ConclusionsReliability is an important factor to consider when designing assessments, and measuring composite reliability can allow the selection of a WBA portfolio with adequate reliability to provide evidence for defensible decisions on trainee progression.
- CLINICAL EVALUATION EXERCISE