A data science approach for early-stage prediction of Patient's susceptibility to acute side effects of advanced radiotherapy

M. Aldraimli; D. Soria; D. Grishchuck; S. Ingram; R. Lyon; A. Mistry; J. Oliveira; R. Samuel; L.E.A. Shelley; S. Osman; M.V. Dwek; D. Azria; J. Chang-Claude; S. Gutierrez-Enriquez; M.C. De Santis; B.S. Rosenstein; D. De Ruysscher; E. Sperk; R.P. Symonds; H. Stobart; A. Vega; L. Veldeman; A. Webb; C.J. Talbot; C.M. West; T. Rattay; T.J. Chaussalet; REQUITE consortium

doi:10.1016/j.compbiomed.2021.104624

A data science approach for early-stage prediction of Patient's susceptibility to acute side effects of advanced radiotherapy

M. Aldraimli^*, D. Soria, D. Grishchuck, S. Ingram, R. Lyon, A. Mistry, J. Oliveira, R. Samuel, L.E.A. Shelley, S. Osman, M.V. Dwek, D. Azria, J. Chang-Claude, S. Gutierrez-Enriquez, M.C. De Santis, B.S. Rosenstein, D. De Ruysscher, E. Sperk, R.P. Symonds, H. StobartA. Vega, L. Veldeman, A. Webb, C.J. Talbot, C.M. West^*, T. Rattay, T.J. Chaussalet, REQUITE consortium

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

The prediction by classification of side effects incidence in a given medical treatment is a common challenge in medical research. Machine Learning (ML) methods are widely used in the areas of risk prediction and classification. The primary objective of such algorithms is to use several features to predict dichotomous responses (e.g., disease positive/negative). Similar to statistical inference modelling, ML modelling is subject to the class imbalance problem and is affected by the majority class, increasing the false-negative rate. In this study, seventynine ML models were built and evaluated to classify approximately 2000 participants from 26 hospitals in eight different countries into two groups of radiotherapy (RT) side effects incidence based on recorded observations from the international study of RT related toxicity "REQUITE". We also examined the effect of sampling techniques and cost-sensitive learning methods on the models when dealing with class imbalance. The combinations of such techniques used had a significant impact on the classification. They resulted in an improvement in incidence status prediction by shifting classifiers' attention to the minority group. The best classification model for RT acute toxicity prediction was identified based on domain experts' success criteria. The Area Under Receiver Operator Characteristic curve of the models tested with an isolated dataset ranged from 0.50 to 0.77. The scale of improved results is promising and will guide further development of models to predict RT acute toxicities. One model was optimised and found to be beneficial to identify patients who are at risk of developing acute RT early-stage toxicities as a result of undergoing breast RT ensuring relevant treatment interventions can be appropriately targeted. The design of the approach presented in this paper resulted in producing a preclinicalvalid prediction model. The study was developed by a multi-disciplinary collaboration of data scientists, medical physicists, oncologists and surgeons in the UK Radiotherapy Machine Learning Network.

Original language	English
Article number	104624
Number of pages	20
Journal	Computers in Biology and Medicine
Volume	135
DOIs	https://doi.org/10.1016/j.compbiomed.2021.104624
Publication status	Published - 1 Aug 2021

Keywords

ACUTE SKIN TOXICITY
BREAST-CANCER
COHORT
CURVE
Classification
Desquamation
Early toxicities
Imbalanced learning
MODEL
Machine learning
Meta-learning
NTCP
PARAMETERS
RADIATION
REQUITE
Radiotherapy
SMOTE
THERAPY

Access to Document

10.1016/j.compbiomed.2021.104624

Cite this

Aldraimli, M., Soria, D., Grishchuck, D., Ingram, S., Lyon, R., Mistry, A., Oliveira, J., Samuel, R., Shelley, L. E. A., Osman, S., Dwek, M. V., Azria, D., Chang-Claude, J., Gutierrez-Enriquez, S., De Santis, M. C., Rosenstein, B. S., De Ruysscher, D., Sperk, E., Symonds, R. P., ... REQUITE consortium (2021). A data science approach for early-stage prediction of Patient's susceptibility to acute side effects of advanced radiotherapy. Computers in Biology and Medicine, 135, Article 104624. https://doi.org/10.1016/j.compbiomed.2021.104624

@article{ad2eab2a39964b9abb5caeb7d2806ed8,

title = "A data science approach for early-stage prediction of Patient's susceptibility to acute side effects of advanced radiotherapy",

abstract = "The prediction by classification of side effects incidence in a given medical treatment is a common challenge in medical research. Machine Learning (ML) methods are widely used in the areas of risk prediction and classification. The primary objective of such algorithms is to use several features to predict dichotomous responses (e.g., disease positive/negative). Similar to statistical inference modelling, ML modelling is subject to the class imbalance problem and is affected by the majority class, increasing the false-negative rate. In this study, seventynine ML models were built and evaluated to classify approximately 2000 participants from 26 hospitals in eight different countries into two groups of radiotherapy (RT) side effects incidence based on recorded observations from the international study of RT related toxicity {"}REQUITE{"}. We also examined the effect of sampling techniques and cost-sensitive learning methods on the models when dealing with class imbalance. The combinations of such techniques used had a significant impact on the classification. They resulted in an improvement in incidence status prediction by shifting classifiers' attention to the minority group. The best classification model for RT acute toxicity prediction was identified based on domain experts' success criteria. The Area Under Receiver Operator Characteristic curve of the models tested with an isolated dataset ranged from 0.50 to 0.77. The scale of improved results is promising and will guide further development of models to predict RT acute toxicities. One model was optimised and found to be beneficial to identify patients who are at risk of developing acute RT early-stage toxicities as a result of undergoing breast RT ensuring relevant treatment interventions can be appropriately targeted. The design of the approach presented in this paper resulted in producing a preclinicalvalid prediction model. The study was developed by a multi-disciplinary collaboration of data scientists, medical physicists, oncologists and surgeons in the UK Radiotherapy Machine Learning Network.",

keywords = "ACUTE SKIN TOXICITY, BREAST-CANCER, COHORT, CURVE, Classification, Desquamation, Early toxicities, Imbalanced learning, MODEL, Machine learning, Meta-learning, NTCP, PARAMETERS, RADIATION, REQUITE, Radiotherapy, SMOTE, THERAPY",

author = "M. Aldraimli and D. Soria and D. Grishchuck and S. Ingram and R. Lyon and A. Mistry and J. Oliveira and R. Samuel and L.E.A. Shelley and S. Osman and M.V. Dwek and D. Azria and J. Chang-Claude and S. Gutierrez-Enriquez and {De Santis}, M.C. and B.S. Rosenstein and {De Ruysscher}, D. and E. Sperk and R.P. Symonds and H. Stobart and A. Vega and L. Veldeman and A. Webb and C.J. Talbot and C.M. West and T. Rattay and T.J. Chaussalet and {REQUITE consortium}",

year = "2021",

month = aug,

day = "1",

doi = "10.1016/j.compbiomed.2021.104624",

language = "English",

volume = "135",

journal = "Computers in Biology and Medicine",

issn = "0010-4825",

publisher = "Elsevier Science",

}

Aldraimli, M, Soria, D, Grishchuck, D, Ingram, S, Lyon, R, Mistry, A, Oliveira, J, Samuel, R, Shelley, LEA, Osman, S, Dwek, MV, Azria, D, Chang-Claude, J, Gutierrez-Enriquez, S, De Santis, MC, Rosenstein, BS, De Ruysscher, D, Sperk, E, Symonds, RP, Stobart, H, Vega, A, Veldeman, L, Webb, A, Talbot, CJ, West, CM, Rattay, T, Chaussalet, TJ & REQUITE consortium 2021, 'A data science approach for early-stage prediction of Patient's susceptibility to acute side effects of advanced radiotherapy', Computers in Biology and Medicine, vol. 135, 104624. https://doi.org/10.1016/j.compbiomed.2021.104624

TY - JOUR

T1 - A data science approach for early-stage prediction of Patient's susceptibility to acute side effects of advanced radiotherapy

AU - Aldraimli, M.

AU - Soria, D.

AU - Grishchuck, D.

AU - Ingram, S.

AU - Lyon, R.

AU - Mistry, A.

AU - Oliveira, J.

AU - Samuel, R.

AU - Shelley, L.E.A.

AU - Osman, S.

AU - Dwek, M.V.

AU - Azria, D.

AU - Chang-Claude, J.

AU - Gutierrez-Enriquez, S.

AU - De Santis, M.C.

AU - Rosenstein, B.S.

AU - De Ruysscher, D.

AU - Sperk, E.

AU - Symonds, R.P.

AU - Stobart, H.

AU - Vega, A.

AU - Veldeman, L.

AU - Webb, A.

AU - Talbot, C.J.

AU - West, C.M.

AU - Rattay, T.

AU - Chaussalet, T.J.

AU - REQUITE consortium

PY - 2021/8/1

Y1 - 2021/8/1

N2 - The prediction by classification of side effects incidence in a given medical treatment is a common challenge in medical research. Machine Learning (ML) methods are widely used in the areas of risk prediction and classification. The primary objective of such algorithms is to use several features to predict dichotomous responses (e.g., disease positive/negative). Similar to statistical inference modelling, ML modelling is subject to the class imbalance problem and is affected by the majority class, increasing the false-negative rate. In this study, seventynine ML models were built and evaluated to classify approximately 2000 participants from 26 hospitals in eight different countries into two groups of radiotherapy (RT) side effects incidence based on recorded observations from the international study of RT related toxicity "REQUITE". We also examined the effect of sampling techniques and cost-sensitive learning methods on the models when dealing with class imbalance. The combinations of such techniques used had a significant impact on the classification. They resulted in an improvement in incidence status prediction by shifting classifiers' attention to the minority group. The best classification model for RT acute toxicity prediction was identified based on domain experts' success criteria. The Area Under Receiver Operator Characteristic curve of the models tested with an isolated dataset ranged from 0.50 to 0.77. The scale of improved results is promising and will guide further development of models to predict RT acute toxicities. One model was optimised and found to be beneficial to identify patients who are at risk of developing acute RT early-stage toxicities as a result of undergoing breast RT ensuring relevant treatment interventions can be appropriately targeted. The design of the approach presented in this paper resulted in producing a preclinicalvalid prediction model. The study was developed by a multi-disciplinary collaboration of data scientists, medical physicists, oncologists and surgeons in the UK Radiotherapy Machine Learning Network.

AB - The prediction by classification of side effects incidence in a given medical treatment is a common challenge in medical research. Machine Learning (ML) methods are widely used in the areas of risk prediction and classification. The primary objective of such algorithms is to use several features to predict dichotomous responses (e.g., disease positive/negative). Similar to statistical inference modelling, ML modelling is subject to the class imbalance problem and is affected by the majority class, increasing the false-negative rate. In this study, seventynine ML models were built and evaluated to classify approximately 2000 participants from 26 hospitals in eight different countries into two groups of radiotherapy (RT) side effects incidence based on recorded observations from the international study of RT related toxicity "REQUITE". We also examined the effect of sampling techniques and cost-sensitive learning methods on the models when dealing with class imbalance. The combinations of such techniques used had a significant impact on the classification. They resulted in an improvement in incidence status prediction by shifting classifiers' attention to the minority group. The best classification model for RT acute toxicity prediction was identified based on domain experts' success criteria. The Area Under Receiver Operator Characteristic curve of the models tested with an isolated dataset ranged from 0.50 to 0.77. The scale of improved results is promising and will guide further development of models to predict RT acute toxicities. One model was optimised and found to be beneficial to identify patients who are at risk of developing acute RT early-stage toxicities as a result of undergoing breast RT ensuring relevant treatment interventions can be appropriately targeted. The design of the approach presented in this paper resulted in producing a preclinicalvalid prediction model. The study was developed by a multi-disciplinary collaboration of data scientists, medical physicists, oncologists and surgeons in the UK Radiotherapy Machine Learning Network.

KW - ACUTE SKIN TOXICITY

KW - BREAST-CANCER

KW - COHORT

KW - CURVE

KW - Classification

KW - Desquamation

KW - Early toxicities

KW - Imbalanced learning

KW - MODEL

KW - Machine learning

KW - Meta-learning

KW - NTCP

KW - PARAMETERS

KW - RADIATION

KW - REQUITE

KW - Radiotherapy

KW - SMOTE

KW - THERAPY

U2 - 10.1016/j.compbiomed.2021.104624

DO - 10.1016/j.compbiomed.2021.104624

M3 - Article

C2 - 34247131

SN - 0010-4825

VL - 135

JO - Computers in Biology and Medicine

JF - Computers in Biology and Medicine

M1 - 104624

ER -