A critical evaluation of the validity and the reliability of global competency constructs for supervisor assessment of junior medical trainees

D. A. McGill; C. P. M. van der Vleuten; M. J. Clarke

doi:10.1007/s10459-012-9410-z

A critical evaluation of the validity and the reliability of global competency constructs for supervisor assessment of junior medical trainees

D. A. McGill^*, C. P. M. van der Vleuten, M. J. Clarke

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

Supervisor assessments are critical for both formative and summative assessment in the workplace. Supervisor ratings remain an important source of such assessment in many educational jurisdictions even though there is ambiguity about their validity and reliability. The aims of this evaluation is to explore the: (1) construct validity of ward-based supervisor competency assessments; (2) reliability of supervisors for observing any overarching domain constructs identified (factors); (3) stability of factors across subgroups of contexts, supervisors and trainees; and (4) position of the observations compared to the established literature. Evaluated assessments were all those used to judge intern (trainee) suitability to become an unconditionally registered medical practitioner in the Australian Capital Territory, Australia in 2007-2008. Initial construct identification is by traditional exploratory factor analysis (EFA) using Principal component analysis with Varimax rotation. Factor stability is explored by EFA of subgroups by different contexts such as hospital type, and different types of supervisors and trainees. The unit of analysis is each assessment, and includes all available assessments without aggregation of any scores to obtain the factors. Reliability of identified constructs is by variance components analysis of the summed trainee scores for each factor and the number of assessments needed to provide an acceptably reliable assessment using the construct, the reliability unit of analysis being the score for each factor for every assessment. For the 374 assessments from 74 trainees and 73 supervisors, the EFA resulted in 3 factors identified from the scree plot, accounting for only 68 % of the variance with factor 1 having features of a "general professional job performance" competency (eigenvalue 7.630; variance 54.5 %); factor 2 "clinical skills" (eigenvalue 1.036; variance 7.4 %); and factor 3 "professional and personal" competency (eigenvalue 0.867; variance 6.2 %). The percent trainee score variance for the summed competency item scores for factors 1, 2 and 3 were 40.4, 27.4 and 22.9 % respectively. The number of assessments needed to give a reliability coefficient of 0.80 was 6, 11 and 13 respectively. The factor structure remained stable for subgroups of female trainees, Australian graduate trainees, the central hospital, surgeons, staff specialist, visiting medical officers and the separation into single years. Physicians as supervisors, male trainees, and male supervisors all had a different grouping of items within 3 factors which all had competency items that collapsed into the predefined "face value" constructs of competence. These observations add new insights compared to the established literature. For the setting, most supervisors appear to be assessing a dominant construct domain which is similar to a general professional job performance competency. This global construct consists of individual competency items that supervisors spontaneously align and has acceptable assessment reliability. However, factor structure instability between different populations of supervisors and trainees means that subpopulations of trainees may be assessed differently and that some subpopulations of supervisors are assessing the same trainees with different constructs than other supervisors. The lack of competency criterion standardisation of supervisors' assessments brings into question the validity of this assessment method as currently used.

Original language	English
Pages (from-to)	701-725
Journal	Advances in Health Sciences Education
Volume	18
Issue number	4
DOIs	https://doi.org/10.1007/s10459-012-9410-z
Publication status	Published - Oct 2013

Keywords

Supervisor assessment
Workplace assessment
Competency assessment
Exploratory factor analysis
Reliability
Validity

Access to Document

10.1007/s10459-012-9410-z

Cite this

@article{68493d8c00b74e8b8a1830f509c264bb,

title = "A critical evaluation of the validity and the reliability of global competency constructs for supervisor assessment of junior medical trainees",

abstract = "Supervisor assessments are critical for both formative and summative assessment in the workplace. Supervisor ratings remain an important source of such assessment in many educational jurisdictions even though there is ambiguity about their validity and reliability. The aims of this evaluation is to explore the: (1) construct validity of ward-based supervisor competency assessments; (2) reliability of supervisors for observing any overarching domain constructs identified (factors); (3) stability of factors across subgroups of contexts, supervisors and trainees; and (4) position of the observations compared to the established literature. Evaluated assessments were all those used to judge intern (trainee) suitability to become an unconditionally registered medical practitioner in the Australian Capital Territory, Australia in 2007-2008. Initial construct identification is by traditional exploratory factor analysis (EFA) using Principal component analysis with Varimax rotation. Factor stability is explored by EFA of subgroups by different contexts such as hospital type, and different types of supervisors and trainees. The unit of analysis is each assessment, and includes all available assessments without aggregation of any scores to obtain the factors. Reliability of identified constructs is by variance components analysis of the summed trainee scores for each factor and the number of assessments needed to provide an acceptably reliable assessment using the construct, the reliability unit of analysis being the score for each factor for every assessment. For the 374 assessments from 74 trainees and 73 supervisors, the EFA resulted in 3 factors identified from the scree plot, accounting for only 68 % of the variance with factor 1 having features of a {"}general professional job performance{"} competency (eigenvalue 7.630; variance 54.5 %); factor 2 {"}clinical skills{"} (eigenvalue 1.036; variance 7.4 %); and factor 3 {"}professional and personal{"} competency (eigenvalue 0.867; variance 6.2 %). The percent trainee score variance for the summed competency item scores for factors 1, 2 and 3 were 40.4, 27.4 and 22.9 % respectively. The number of assessments needed to give a reliability coefficient of 0.80 was 6, 11 and 13 respectively. The factor structure remained stable for subgroups of female trainees, Australian graduate trainees, the central hospital, surgeons, staff specialist, visiting medical officers and the separation into single years. Physicians as supervisors, male trainees, and male supervisors all had a different grouping of items within 3 factors which all had competency items that collapsed into the predefined {"}face value{"} constructs of competence. These observations add new insights compared to the established literature. For the setting, most supervisors appear to be assessing a dominant construct domain which is similar to a general professional job performance competency. This global construct consists of individual competency items that supervisors spontaneously align and has acceptable assessment reliability. However, factor structure instability between different populations of supervisors and trainees means that subpopulations of trainees may be assessed differently and that some subpopulations of supervisors are assessing the same trainees with different constructs than other supervisors. The lack of competency criterion standardisation of supervisors' assessments brings into question the validity of this assessment method as currently used.",

keywords = "Supervisor assessment, Workplace assessment, Competency assessment, Exploratory factor analysis, Reliability, Validity",

author = "McGill, {D. A.} and {van der Vleuten}, {C. P. M.} and Clarke, {M. J.}",

year = "2013",

month = oct,

doi = "10.1007/s10459-012-9410-z",

language = "English",

volume = "18",

pages = "701--725",

journal = "Advances in Health Sciences Education",

issn = "1382-4996",

publisher = "Springer, Cham",

number = "4",

}

TY - JOUR

T1 - A critical evaluation of the validity and the reliability of global competency constructs for supervisor assessment of junior medical trainees

AU - McGill, D. A.

AU - van der Vleuten, C. P. M.

AU - Clarke, M. J.

PY - 2013/10

Y1 - 2013/10

N2 - Supervisor assessments are critical for both formative and summative assessment in the workplace. Supervisor ratings remain an important source of such assessment in many educational jurisdictions even though there is ambiguity about their validity and reliability. The aims of this evaluation is to explore the: (1) construct validity of ward-based supervisor competency assessments; (2) reliability of supervisors for observing any overarching domain constructs identified (factors); (3) stability of factors across subgroups of contexts, supervisors and trainees; and (4) position of the observations compared to the established literature. Evaluated assessments were all those used to judge intern (trainee) suitability to become an unconditionally registered medical practitioner in the Australian Capital Territory, Australia in 2007-2008. Initial construct identification is by traditional exploratory factor analysis (EFA) using Principal component analysis with Varimax rotation. Factor stability is explored by EFA of subgroups by different contexts such as hospital type, and different types of supervisors and trainees. The unit of analysis is each assessment, and includes all available assessments without aggregation of any scores to obtain the factors. Reliability of identified constructs is by variance components analysis of the summed trainee scores for each factor and the number of assessments needed to provide an acceptably reliable assessment using the construct, the reliability unit of analysis being the score for each factor for every assessment. For the 374 assessments from 74 trainees and 73 supervisors, the EFA resulted in 3 factors identified from the scree plot, accounting for only 68 % of the variance with factor 1 having features of a "general professional job performance" competency (eigenvalue 7.630; variance 54.5 %); factor 2 "clinical skills" (eigenvalue 1.036; variance 7.4 %); and factor 3 "professional and personal" competency (eigenvalue 0.867; variance 6.2 %). The percent trainee score variance for the summed competency item scores for factors 1, 2 and 3 were 40.4, 27.4 and 22.9 % respectively. The number of assessments needed to give a reliability coefficient of 0.80 was 6, 11 and 13 respectively. The factor structure remained stable for subgroups of female trainees, Australian graduate trainees, the central hospital, surgeons, staff specialist, visiting medical officers and the separation into single years. Physicians as supervisors, male trainees, and male supervisors all had a different grouping of items within 3 factors which all had competency items that collapsed into the predefined "face value" constructs of competence. These observations add new insights compared to the established literature. For the setting, most supervisors appear to be assessing a dominant construct domain which is similar to a general professional job performance competency. This global construct consists of individual competency items that supervisors spontaneously align and has acceptable assessment reliability. However, factor structure instability between different populations of supervisors and trainees means that subpopulations of trainees may be assessed differently and that some subpopulations of supervisors are assessing the same trainees with different constructs than other supervisors. The lack of competency criterion standardisation of supervisors' assessments brings into question the validity of this assessment method as currently used.

AB - Supervisor assessments are critical for both formative and summative assessment in the workplace. Supervisor ratings remain an important source of such assessment in many educational jurisdictions even though there is ambiguity about their validity and reliability. The aims of this evaluation is to explore the: (1) construct validity of ward-based supervisor competency assessments; (2) reliability of supervisors for observing any overarching domain constructs identified (factors); (3) stability of factors across subgroups of contexts, supervisors and trainees; and (4) position of the observations compared to the established literature. Evaluated assessments were all those used to judge intern (trainee) suitability to become an unconditionally registered medical practitioner in the Australian Capital Territory, Australia in 2007-2008. Initial construct identification is by traditional exploratory factor analysis (EFA) using Principal component analysis with Varimax rotation. Factor stability is explored by EFA of subgroups by different contexts such as hospital type, and different types of supervisors and trainees. The unit of analysis is each assessment, and includes all available assessments without aggregation of any scores to obtain the factors. Reliability of identified constructs is by variance components analysis of the summed trainee scores for each factor and the number of assessments needed to provide an acceptably reliable assessment using the construct, the reliability unit of analysis being the score for each factor for every assessment. For the 374 assessments from 74 trainees and 73 supervisors, the EFA resulted in 3 factors identified from the scree plot, accounting for only 68 % of the variance with factor 1 having features of a "general professional job performance" competency (eigenvalue 7.630; variance 54.5 %); factor 2 "clinical skills" (eigenvalue 1.036; variance 7.4 %); and factor 3 "professional and personal" competency (eigenvalue 0.867; variance 6.2 %). The percent trainee score variance for the summed competency item scores for factors 1, 2 and 3 were 40.4, 27.4 and 22.9 % respectively. The number of assessments needed to give a reliability coefficient of 0.80 was 6, 11 and 13 respectively. The factor structure remained stable for subgroups of female trainees, Australian graduate trainees, the central hospital, surgeons, staff specialist, visiting medical officers and the separation into single years. Physicians as supervisors, male trainees, and male supervisors all had a different grouping of items within 3 factors which all had competency items that collapsed into the predefined "face value" constructs of competence. These observations add new insights compared to the established literature. For the setting, most supervisors appear to be assessing a dominant construct domain which is similar to a general professional job performance competency. This global construct consists of individual competency items that supervisors spontaneously align and has acceptable assessment reliability. However, factor structure instability between different populations of supervisors and trainees means that subpopulations of trainees may be assessed differently and that some subpopulations of supervisors are assessing the same trainees with different constructs than other supervisors. The lack of competency criterion standardisation of supervisors' assessments brings into question the validity of this assessment method as currently used.

KW - Supervisor assessment

KW - Workplace assessment

KW - Competency assessment

KW - Exploratory factor analysis

KW - Reliability

KW - Validity

U2 - 10.1007/s10459-012-9410-z

DO - 10.1007/s10459-012-9410-z

M3 - Article

C2 - 23053869

SN - 1382-4996

VL - 18

SP - 701

EP - 725

JO - Advances in Health Sciences Education

JF - Advances in Health Sciences Education

IS - 4

ER -