Development and validation of deep learning and BERT models for classification of lung cancer radiology reports

S. Mithun; Ashish Kumar Jha; Umesh B. Sherkhane; Vinay Jaiswar; Nilendu C. Purandare; V. Rangarajan; A. Dekker; Sander Puts; Inigo Bermejo; L. Wee

doi:10.1016/j.imu.2023.101294

Development and validation of deep learning and BERT models for classification of lung cancer radiology reports

S. Mithun^*, Ashish Kumar Jha, Umesh B. Sherkhane, Vinay Jaiswar, Nilendu C. Purandare, V. Rangarajan, A. Dekker, Sander Puts, Inigo Bermejo, L. Wee

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

Purpose: Manual cohort building from radiology reports can be tedious. Natural Language Processing (NLP) can be used for automated cohort building. In this study, we have developed and validated an NLP approach based on deep learning (DL) to select lung cancer reports from a thoracic disease management group cohort. Materials and methods: 4064 radiology reports (CT and PET/CT) of a thoracic disease management group reported between 2014 and 2016 were used. These reports were anonymised, cleaned, text normalized and split into a training, testing, and validation set. External validation was performed on radiology reports from the MIMIC-III clinical database. We used three DL models, namely, Bi-LSTM_simple, Bi-LSTM_dropout, and Pre-trained _BERT model to predict if a report concerned lung cancer. We studied the effect of minority oversampling on all models. Results: Without oversampling, the F1 scores at 95% CI for Bi-LSTM_simple, Bi-LSTM_dropout and BERT were 0.89, 0.90, and 0.86; with oversampling, the F1 scores were 0.94, 0.94, and 0.9, on internal validation. On external validation the F1-scores of Bi-LSTM_simple, Bi-LSTM_dropout and BERT models were 0.63, 0.77 and 0.80 without oversampling and 0.72, 0.78 and 0.77 with oversampling. Conclusion: Pre-trained BERT model and Bi-LSTM_dropout models to predict a lung cancer report showed consistent performance on internal and external validation with the BERT model exhibiting superior performance. The overall F1 score decreased on external validation for both Bi-LSTM models with the Bi-LSTM_simple model showing a more significant drop. All models showed some improvement on minority oversampling.

Original language	English
Article number	101294
Number of pages	10
Journal	Informatics in Medicine Unlocked
Volume	40
Issue number	1
DOIs	https://doi.org/10.1016/j.imu.2023.101294
Publication status	Published - 1 Jan 2023

Access to Document

10.1016/j.imu.2023.101294Licence: CC BY

Cite this

@article{e0d1ecfadd864e758d77cafd2027cc94,

title = "Development and validation of deep learning and BERT models for classification of lung cancer radiology reports",

abstract = "Purpose: Manual cohort building from radiology reports can be tedious. Natural Language Processing (NLP) can be used for automated cohort building. In this study, we have developed and validated an NLP approach based on deep learning (DL) to select lung cancer reports from a thoracic disease management group cohort. Materials and methods: 4064 radiology reports (CT and PET/CT) of a thoracic disease management group reported between 2014 and 2016 were used. These reports were anonymised, cleaned, text normalized and split into a training, testing, and validation set. External validation was performed on radiology reports from the MIMIC-III clinical database. We used three DL models, namely, Bi-LSTM_simple, Bi-LSTM_dropout, and Pre-trained _BERT model to predict if a report concerned lung cancer. We studied the effect of minority oversampling on all models. Results: Without oversampling, the F1 scores at 95% CI for Bi-LSTM_simple, Bi-LSTM_dropout and BERT were 0.89, 0.90, and 0.86; with oversampling, the F1 scores were 0.94, 0.94, and 0.9, on internal validation. On external validation the F1-scores of Bi-LSTM_simple, Bi-LSTM_dropout and BERT models were 0.63, 0.77 and 0.80 without oversampling and 0.72, 0.78 and 0.77 with oversampling. Conclusion: Pre-trained BERT model and Bi-LSTM_dropout models to predict a lung cancer report showed consistent performance on internal and external validation with the BERT model exhibiting superior performance. The overall F1 score decreased on external validation for both Bi-LSTM models with the Bi-LSTM_simple model showing a more significant drop. All models showed some improvement on minority oversampling.",

author = "S. Mithun and Jha, {Ashish Kumar} and Sherkhane, {Umesh B.} and Vinay Jaiswar and Purandare, {Nilendu C.} and V. Rangarajan and A. Dekker and Sander Puts and Inigo Bermejo and L. Wee",

year = "2023",

month = jan,

day = "1",

doi = "10.1016/j.imu.2023.101294",

language = "English",

volume = "40",

journal = "Informatics in Medicine Unlocked",

issn = "2352-9148",

publisher = "Elsevier Limited",

number = "1",

}

TY - JOUR

T1 - Development and validation of deep learning and BERT models for classification of lung cancer radiology reports

AU - Mithun, S.

AU - Jha, Ashish Kumar

AU - Sherkhane, Umesh B.

AU - Jaiswar, Vinay

AU - Purandare, Nilendu C.

AU - Rangarajan, V.

AU - Dekker, A.

AU - Puts, Sander

AU - Bermejo, Inigo

AU - Wee, L.

PY - 2023/1/1

Y1 - 2023/1/1

N2 - Purpose: Manual cohort building from radiology reports can be tedious. Natural Language Processing (NLP) can be used for automated cohort building. In this study, we have developed and validated an NLP approach based on deep learning (DL) to select lung cancer reports from a thoracic disease management group cohort. Materials and methods: 4064 radiology reports (CT and PET/CT) of a thoracic disease management group reported between 2014 and 2016 were used. These reports were anonymised, cleaned, text normalized and split into a training, testing, and validation set. External validation was performed on radiology reports from the MIMIC-III clinical database. We used three DL models, namely, Bi-LSTM_simple, Bi-LSTM_dropout, and Pre-trained _BERT model to predict if a report concerned lung cancer. We studied the effect of minority oversampling on all models. Results: Without oversampling, the F1 scores at 95% CI for Bi-LSTM_simple, Bi-LSTM_dropout and BERT were 0.89, 0.90, and 0.86; with oversampling, the F1 scores were 0.94, 0.94, and 0.9, on internal validation. On external validation the F1-scores of Bi-LSTM_simple, Bi-LSTM_dropout and BERT models were 0.63, 0.77 and 0.80 without oversampling and 0.72, 0.78 and 0.77 with oversampling. Conclusion: Pre-trained BERT model and Bi-LSTM_dropout models to predict a lung cancer report showed consistent performance on internal and external validation with the BERT model exhibiting superior performance. The overall F1 score decreased on external validation for both Bi-LSTM models with the Bi-LSTM_simple model showing a more significant drop. All models showed some improvement on minority oversampling.

AB - Purpose: Manual cohort building from radiology reports can be tedious. Natural Language Processing (NLP) can be used for automated cohort building. In this study, we have developed and validated an NLP approach based on deep learning (DL) to select lung cancer reports from a thoracic disease management group cohort. Materials and methods: 4064 radiology reports (CT and PET/CT) of a thoracic disease management group reported between 2014 and 2016 were used. These reports were anonymised, cleaned, text normalized and split into a training, testing, and validation set. External validation was performed on radiology reports from the MIMIC-III clinical database. We used three DL models, namely, Bi-LSTM_simple, Bi-LSTM_dropout, and Pre-trained _BERT model to predict if a report concerned lung cancer. We studied the effect of minority oversampling on all models. Results: Without oversampling, the F1 scores at 95% CI for Bi-LSTM_simple, Bi-LSTM_dropout and BERT were 0.89, 0.90, and 0.86; with oversampling, the F1 scores were 0.94, 0.94, and 0.9, on internal validation. On external validation the F1-scores of Bi-LSTM_simple, Bi-LSTM_dropout and BERT models were 0.63, 0.77 and 0.80 without oversampling and 0.72, 0.78 and 0.77 with oversampling. Conclusion: Pre-trained BERT model and Bi-LSTM_dropout models to predict a lung cancer report showed consistent performance on internal and external validation with the BERT model exhibiting superior performance. The overall F1 score decreased on external validation for both Bi-LSTM models with the Bi-LSTM_simple model showing a more significant drop. All models showed some improvement on minority oversampling.

U2 - 10.1016/j.imu.2023.101294

DO - 10.1016/j.imu.2023.101294

M3 - Article

SN - 2352-9148

VL - 40

JO - Informatics in Medicine Unlocked

JF - Informatics in Medicine Unlocked

IS - 1

M1 - 101294

ER -