There is emerging evidence that the performance of risk assessment instruments is weaker when used for clinical decision‐making than for research purposes. For instance, research has found lower agreement between evaluators when the risk assessments are conducted during routine practice. We examined the field interrater reliability of the Short‐Term Assessment of Risk and Treatability: Adolescent Version (START:AV). Clinicians in a Dutch secure youth care facility completed START:AV assessments as part of the treatment routine. Consistent with previous literature, interrater reliability of the items and total scores was lower than previously reported in non‐field studies. Nevertheless, moderate to good interrater reliability was found for final risk judgments on most adverse outcomes. Field studies provide insights into the actual performance of structured risk assessment in real‐world settings, exposing factors that affect reliability. This information is relevant for those who wish to implement structured risk assessment with a level of reliability that is defensible considering the high stakes.