Calibration: the Achilles heel of predictive analytics

Ben van Calster; David J. McLernon; Maarten van Smeden; Laure Wynants; Ewout W. Steyerberg; Topic Group ‘Evaluating diagnostic tests and prediction models’ of the STRATOS initiative

doi:10.1186/s12916-019-1466-7

Calibration: the Achilles heel of predictive analytics

Ben van Calster^*, David J. McLernon, Maarten van Smeden, Laure Wynants, Ewout W. Steyerberg, Topic Group ‘Evaluating diagnostic tests and prediction models’ of the STRATOS initiative

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

Background: The assessment of calibration performance of risk prediction models based on regression or more flexible machine learning algorithms receives little attention.

Main text: Herein, we argue that this needs to change immediately because poorly calibrated algorithms can be misleading and potentially harmful for clinical decision-making. We summarize how to avoid poor calibration at algorithm development and how to assess calibration at algorithm validation, emphasizing balance between model complexity and the available sample size. At external validation, calibration curves require sufficiently large samples. Algorithm updating should be considered for appropriate support of clinical practice.

Conclusion: Efforts are required to avoid poor calibration when developing prediction models, to evaluate calibration when validating models, and to update models when indicated. The ultimate aim is to optimize the utility of predictive analytics for shared decision-making and patient counseling.

Original language	English
Article number	230
Number of pages	7
Journal	BMC Medicine
Volume	17
Issue number	1
DOIs	https://doi.org/10.1186/s12916-019-1466-7
Publication status	Published - 16 Dec 2019

Keywords

Calibration
Risk prediction models
Predictive analytics
Overfitting
Heterogeneity
Model performance
LOGISTIC-REGRESSION MODELS
OVARIAN-CANCER
VALIDATION
IMPACT
IVF

Access to Document

10.1186/s12916-019-1466-7Licence: CC BY

Cite this

@article{1ae28d7e269d48bb94fc8409a2999783,

title = "Calibration: the Achilles heel of predictive analytics",

abstract = "Background: The assessment of calibration performance of risk prediction models based on regression or more flexible machine learning algorithms receives little attention.Main text: Herein, we argue that this needs to change immediately because poorly calibrated algorithms can be misleading and potentially harmful for clinical decision-making. We summarize how to avoid poor calibration at algorithm development and how to assess calibration at algorithm validation, emphasizing balance between model complexity and the available sample size. At external validation, calibration curves require sufficiently large samples. Algorithm updating should be considered for appropriate support of clinical practice.Conclusion: Efforts are required to avoid poor calibration when developing prediction models, to evaluate calibration when validating models, and to update models when indicated. The ultimate aim is to optimize the utility of predictive analytics for shared decision-making and patient counseling.",

keywords = "Calibration, Risk prediction models, Predictive analytics, Overfitting, Heterogeneity, Model performance, LOGISTIC-REGRESSION MODELS, OVARIAN-CANCER, VALIDATION, IMPACT, IVF",

author = "{van Calster}, Ben and McLernon, {David J.} and {van Smeden}, Maarten and Laure Wynants and Steyerberg, {Ewout W.} and {Topic Group {\textquoteleft}Evaluating diagnostic tests and prediction models{\textquoteright} of the STRATOS initiative}",

note = "Funding Information: This work was funded by the Research Foundation – Flanders (FWO; grant G0B4716N) and Internal Funds KU Leuven (grant C24/15/037). The funders had no role in study design, data collection, data analysis, interpretation of results, or writing of the manuscript. Publisher Copyright: {\textcopyright} 2019 The Author(s).",

year = "2019",

month = dec,

day = "16",

doi = "10.1186/s12916-019-1466-7",

language = "English",

volume = "17",

journal = "BMC Medicine",

issn = "1741-7015",

publisher = "BioMed Central Ltd",

number = "1",

}

TY - JOUR

T1 - Calibration

T2 - the Achilles heel of predictive analytics

AU - van Calster, Ben

AU - McLernon, David J.

AU - van Smeden, Maarten

AU - Wynants, Laure

AU - Steyerberg, Ewout W.

AU - Topic Group ‘Evaluating diagnostic tests and prediction models’ of the STRATOS initiative

N1 - Funding Information: This work was funded by the Research Foundation – Flanders (FWO; grant G0B4716N) and Internal Funds KU Leuven (grant C24/15/037). The funders had no role in study design, data collection, data analysis, interpretation of results, or writing of the manuscript. Publisher Copyright: © 2019 The Author(s).

PY - 2019/12/16

Y1 - 2019/12/16

N2 - Background: The assessment of calibration performance of risk prediction models based on regression or more flexible machine learning algorithms receives little attention.Main text: Herein, we argue that this needs to change immediately because poorly calibrated algorithms can be misleading and potentially harmful for clinical decision-making. We summarize how to avoid poor calibration at algorithm development and how to assess calibration at algorithm validation, emphasizing balance between model complexity and the available sample size. At external validation, calibration curves require sufficiently large samples. Algorithm updating should be considered for appropriate support of clinical practice.Conclusion: Efforts are required to avoid poor calibration when developing prediction models, to evaluate calibration when validating models, and to update models when indicated. The ultimate aim is to optimize the utility of predictive analytics for shared decision-making and patient counseling.

AB - Background: The assessment of calibration performance of risk prediction models based on regression or more flexible machine learning algorithms receives little attention.Main text: Herein, we argue that this needs to change immediately because poorly calibrated algorithms can be misleading and potentially harmful for clinical decision-making. We summarize how to avoid poor calibration at algorithm development and how to assess calibration at algorithm validation, emphasizing balance between model complexity and the available sample size. At external validation, calibration curves require sufficiently large samples. Algorithm updating should be considered for appropriate support of clinical practice.Conclusion: Efforts are required to avoid poor calibration when developing prediction models, to evaluate calibration when validating models, and to update models when indicated. The ultimate aim is to optimize the utility of predictive analytics for shared decision-making and patient counseling.

KW - Calibration

KW - Risk prediction models

KW - Predictive analytics

KW - Overfitting

KW - Heterogeneity

KW - Model performance

KW - LOGISTIC-REGRESSION MODELS

KW - OVARIAN-CANCER

KW - VALIDATION

KW - IMPACT

KW - IVF

U2 - 10.1186/s12916-019-1466-7

DO - 10.1186/s12916-019-1466-7

M3 - Article

C2 - 31842878

SN - 1741-7015

VL - 17

JO - BMC Medicine

JF - BMC Medicine

IS - 1

M1 - 230

ER -