Three myths about risk thresholds for prediction models

Laure Wynants; Maarten van Smeden; David J. McLernon; Dirk Timmerman; Ewout W. Steyerberg; Ben Van Calster; Topic Group ‘Evaluating diagnostic tests and prediction models’ of the STRATOS initiative

doi:10.1186/s12916-019-1425-3

Three myths about risk thresholds for prediction models

Laure Wynants^*, Maarten van Smeden, David J. McLernon, Dirk Timmerman, Ewout W. Steyerberg, Ben Van Calster, Topic Group ‘Evaluating diagnostic tests and prediction models’ of the STRATOS initiative

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

Background Clinical prediction models are useful in estimating a patient's risk of having a certain disease or experiencing an event in the future based on their current characteristics. Defining an appropriate risk threshold to recommend intervention is a key challenge in bringing a risk prediction model to clinical application; such risk thresholds are often defined in an ad hoc way. This is problematic because tacitly assumed costs of false positive and false negative classifications may not be clinically sensible. For example, when choosing the risk threshold that maximizes the proportion of patients correctly classified, false positives and false negatives are assumed equally costly. Furthermore, small to moderate sample sizes may lead to unstable optimal thresholds, which requires a particularly cautious interpretation of results. Main text We discuss how three common myths about risk thresholds often lead to inappropriate risk stratification of patients. First, we point out the contexts of counseling and shared decision-making in which a continuous risk estimate is more useful than risk stratification. Second, we argue that threshold selection should reflect the consequences of the decisions made following risk stratification. Third, we emphasize that there is usually no universally optimal threshold but rather that a plausible risk threshold depends on the clinical context. Consequently, we recommend to present results for multiple risk thresholds when developing or validating a prediction model. Conclusion Bearing in mind these three considerations can avoid inappropriate allocation (and non-allocation) of interventions. Using discriminating and well-calibrated models will generate better clinical outcomes if context-dependent thresholds are used.

Original language	English
Article number	192
Number of pages	7
Journal	BMC Medicine
Volume	17
Issue number	1
DOIs	https://doi.org/10.1186/s12916-019-1425-3
Publication status	Published - 25 Oct 2019

Keywords

Clinical risk prediction model
Threshold
Decision support techniques
Risk
Data science
Diagnosis
Prognosis
ROC CURVE
CANCER
INDEX
SPECIFICITY
SENSITIVITY
PERFORMANCE
VALIDATION
MORTALITY
AREA

Access to Document

10.1186/s12916-019-1425-3Licence: CC BY

Cite this

@article{7b219de22c424131bf89a899e91fa774,

title = "Three myths about risk thresholds for prediction models",

abstract = "Background Clinical prediction models are useful in estimating a patient's risk of having a certain disease or experiencing an event in the future based on their current characteristics. Defining an appropriate risk threshold to recommend intervention is a key challenge in bringing a risk prediction model to clinical application; such risk thresholds are often defined in an ad hoc way. This is problematic because tacitly assumed costs of false positive and false negative classifications may not be clinically sensible. For example, when choosing the risk threshold that maximizes the proportion of patients correctly classified, false positives and false negatives are assumed equally costly. Furthermore, small to moderate sample sizes may lead to unstable optimal thresholds, which requires a particularly cautious interpretation of results. Main text We discuss how three common myths about risk thresholds often lead to inappropriate risk stratification of patients. First, we point out the contexts of counseling and shared decision-making in which a continuous risk estimate is more useful than risk stratification. Second, we argue that threshold selection should reflect the consequences of the decisions made following risk stratification. Third, we emphasize that there is usually no universally optimal threshold but rather that a plausible risk threshold depends on the clinical context. Consequently, we recommend to present results for multiple risk thresholds when developing or validating a prediction model. Conclusion Bearing in mind these three considerations can avoid inappropriate allocation (and non-allocation) of interventions. Using discriminating and well-calibrated models will generate better clinical outcomes if context-dependent thresholds are used.",

keywords = "Clinical risk prediction model, Threshold, Decision support techniques, Risk, Data science, Diagnosis, Prognosis, ROC CURVE, CANCER, INDEX, SPECIFICITY, SENSITIVITY, PERFORMANCE, VALIDATION, MORTALITY, AREA",

author = "Laure Wynants and {van Smeden}, Maarten and McLernon, {David J.} and Dirk Timmerman and Steyerberg, {Ewout W.} and {Van Calster}, Ben and {Topic Group {\textquoteleft}Evaluating diagnostic tests and prediction models{\textquoteright} of the STRATOS initiative}",

note = "Funding Information: The study is supported by the Research Foundation-Flanders (FWO) project G0B4716N and Internal Funds KU Leuven (project C24/15/037). Laure Wynants is a post-doctoral fellow of the Research Foundation – Flanders (FWO). The funding bodies had no role in the design of the study, collection, analysis, interpretation of data, nor in writing the manuscript. Publisher Copyright: {\textcopyright} 2019 The Author(s).",

year = "2019",

month = oct,

day = "25",

doi = "10.1186/s12916-019-1425-3",

language = "English",

volume = "17",

journal = "BMC Medicine",

issn = "1741-7015",

publisher = "BioMed Central Ltd",

number = "1",

}

TY - JOUR

T1 - Three myths about risk thresholds for prediction models

AU - Wynants, Laure

AU - van Smeden, Maarten

AU - McLernon, David J.

AU - Timmerman, Dirk

AU - Steyerberg, Ewout W.

AU - Van Calster, Ben

AU - Topic Group ‘Evaluating diagnostic tests and prediction models’ of the STRATOS initiative

N1 - Funding Information: The study is supported by the Research Foundation-Flanders (FWO) project G0B4716N and Internal Funds KU Leuven (project C24/15/037). Laure Wynants is a post-doctoral fellow of the Research Foundation – Flanders (FWO). The funding bodies had no role in the design of the study, collection, analysis, interpretation of data, nor in writing the manuscript. Publisher Copyright: © 2019 The Author(s).

PY - 2019/10/25

Y1 - 2019/10/25

N2 - Background Clinical prediction models are useful in estimating a patient's risk of having a certain disease or experiencing an event in the future based on their current characteristics. Defining an appropriate risk threshold to recommend intervention is a key challenge in bringing a risk prediction model to clinical application; such risk thresholds are often defined in an ad hoc way. This is problematic because tacitly assumed costs of false positive and false negative classifications may not be clinically sensible. For example, when choosing the risk threshold that maximizes the proportion of patients correctly classified, false positives and false negatives are assumed equally costly. Furthermore, small to moderate sample sizes may lead to unstable optimal thresholds, which requires a particularly cautious interpretation of results. Main text We discuss how three common myths about risk thresholds often lead to inappropriate risk stratification of patients. First, we point out the contexts of counseling and shared decision-making in which a continuous risk estimate is more useful than risk stratification. Second, we argue that threshold selection should reflect the consequences of the decisions made following risk stratification. Third, we emphasize that there is usually no universally optimal threshold but rather that a plausible risk threshold depends on the clinical context. Consequently, we recommend to present results for multiple risk thresholds when developing or validating a prediction model. Conclusion Bearing in mind these three considerations can avoid inappropriate allocation (and non-allocation) of interventions. Using discriminating and well-calibrated models will generate better clinical outcomes if context-dependent thresholds are used.

AB - Background Clinical prediction models are useful in estimating a patient's risk of having a certain disease or experiencing an event in the future based on their current characteristics. Defining an appropriate risk threshold to recommend intervention is a key challenge in bringing a risk prediction model to clinical application; such risk thresholds are often defined in an ad hoc way. This is problematic because tacitly assumed costs of false positive and false negative classifications may not be clinically sensible. For example, when choosing the risk threshold that maximizes the proportion of patients correctly classified, false positives and false negatives are assumed equally costly. Furthermore, small to moderate sample sizes may lead to unstable optimal thresholds, which requires a particularly cautious interpretation of results. Main text We discuss how three common myths about risk thresholds often lead to inappropriate risk stratification of patients. First, we point out the contexts of counseling and shared decision-making in which a continuous risk estimate is more useful than risk stratification. Second, we argue that threshold selection should reflect the consequences of the decisions made following risk stratification. Third, we emphasize that there is usually no universally optimal threshold but rather that a plausible risk threshold depends on the clinical context. Consequently, we recommend to present results for multiple risk thresholds when developing or validating a prediction model. Conclusion Bearing in mind these three considerations can avoid inappropriate allocation (and non-allocation) of interventions. Using discriminating and well-calibrated models will generate better clinical outcomes if context-dependent thresholds are used.

KW - Clinical risk prediction model

KW - Threshold

KW - Decision support techniques

KW - Risk

KW - Data science

KW - Diagnosis

KW - Prognosis

KW - ROC CURVE

KW - CANCER

KW - INDEX

KW - SPECIFICITY

KW - SENSITIVITY

KW - PERFORMANCE

KW - VALIDATION

KW - MORTALITY

KW - AREA

U2 - 10.1186/s12916-019-1425-3

DO - 10.1186/s12916-019-1425-3

M3 - Article

C2 - 31651317

SN - 1741-7015

VL - 17

JO - BMC Medicine

JF - BMC Medicine

IS - 1

M1 - 192

ER -