Objective: Each year over 1.5 million health care professionals attend emergency care courses. Despite high stakes for patients and extensive resources involved, little evidence exists on the quality of assessment. The aim of this study was to evaluate the validity and reliability of commonly used formats in assessing emergency care skills. Methods: Residents were assessed at the end of a 2-week emergency course; a subgroup was videotaped. Psychometric analyses were conducted to assess the validity and inter-rater reliability of the assessment instrument, which included a checklist, a 9-item competency scale and a global performance scale. Results: A group of 144 residents and 12 raters participated in the study; 22 residents were videotaped and re-assessed by 8 raters. The checklists showed limited validity and poor inter-rater reliability for the dimensions "correct'' and "timely'' (ICC=.30 and. 39 resp.). The competency scale had good construct validity, consisting of a clinical and a communication subscale. The internal consistency of the (sub)scales was high (alpha=.93/.91/.86). The inter-rater reliability was moderate for the clinical competency subscale (.49) and the global performance scale (.50), but poor for the communication subscale (.27). A generalizability study showed that for a reliable assessment 5-13 raters are needed when using checklists, and four when using the clinical competency scale or the global performance scale. Conclusions: This study shows poor validity and reliability for assessing emergency skills with checklists but good validity and moderate reliability with clinical competency or global performance scales. Involving more raters can improve the reliability substantially. Recommendations are made to improve this high stakes skill assessment.