Clinical Concept-Based Radiology Reports Classification Pipeline for Lung Carcinoma

Sneha Mithun; Ashish Kumar Jha; Umesh B Sherkhane; Vinay Jaiswar; Nilendu C Purandare; Andre Dekker; Sander Puts; Inigo Bermejo; V Rangarajan; Catharina M L Zegers; Leonard Wee

doi:10.1007/s10278-023-00787-z

Clinical Concept-Based Radiology Reports Classification Pipeline for Lung Carcinoma

Sneha Mithun^*, Ashish Kumar Jha, Umesh B Sherkhane, Vinay Jaiswar, Nilendu C Purandare, Andre Dekker, Sander Puts, Inigo Bermejo, V Rangarajan, Catharina M L Zegers, Leonard Wee

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

Rising incidence and mortality of cancer have led to an incremental amount of research in the field. To learn from preexisting data, it has become important to capture maximum information related to disease type, stage, treatment, and outcomes. Medical imaging reports are rich in this kind of information but are only present as free text. The extraction of information from such unstructured text reports is labor-intensive. The use of Natural Language Processing (NLP) tools to extract information from radiology reports can make it less time-consuming as well as more effective. In this study, we have developed and compared different models for the classification of lung carcinoma reports using clinical concepts. This study was approved by the institutional ethics committee as a retrospective study with a waiver of informed consent. A clinical concept-based classification pipeline for lung carcinoma radiology reports was developed using rule-based as well as machine learning models and compared. The machine learning models used were XGBoost and two more deep learning model architectures with bidirectional long short-term neural networks. A corpus consisting of 1700 radiology reports including computed tomography (CT) and positron emission tomography/computed tomography (PET/CT) reports were used for development and testing. Five hundred one radiology reports from MIMIC-III Clinical Database version 1.4 was used for external validation. The pipeline achieved an overall F1 score of 0.94 on the internal set and 0.74 on external validation with the rule-based algorithm using expert input giving the best performance. Among the machine learning models, the Bi-LSTM_dropout model performed better than the ML model using XGBoost and the Bi-LSTM_simple model on internal set, whereas on external validation, the Bi-LSTM_simple model performed relatively better than other 2. This pipeline can be used for clinical concept-based classification of radiology reports related to lung carcinoma from a huge corpus and also for automated annotation of these reports.

Original language	English
Pages (from-to)	812-826
Number of pages	15
Journal	Journal of Digital Imaging
Volume	36
Issue number	3
Early online date	14 Feb 2023
DOIs	https://doi.org/10.1007/s10278-023-00787-z
Publication status	Published - Jun 2023

Keywords

Artificial intelligence (AI)
natural language processing (NLP)
Lung carcinoma
Deep learning
Big data analytics
Electronic Medical Records
Radiology report
Clinical concept extraction
Named entity recognition

Access to Document

10.1007/s10278-023-00787-zLicence: CC BY

Cite this

@article{794d69d985bb45dd84c7d14fa46590ed,

title = "Clinical Concept-Based Radiology Reports Classification Pipeline for Lung Carcinoma",

abstract = "Rising incidence and mortality of cancer have led to an incremental amount of research in the field. To learn from preexisting data, it has become important to capture maximum information related to disease type, stage, treatment, and outcomes. Medical imaging reports are rich in this kind of information but are only present as free text. The extraction of information from such unstructured text reports is labor-intensive. The use of Natural Language Processing (NLP) tools to extract information from radiology reports can make it less time-consuming as well as more effective. In this study, we have developed and compared different models for the classification of lung carcinoma reports using clinical concepts. This study was approved by the institutional ethics committee as a retrospective study with a waiver of informed consent. A clinical concept-based classification pipeline for lung carcinoma radiology reports was developed using rule-based as well as machine learning models and compared. The machine learning models used were XGBoost and two more deep learning model architectures with bidirectional long short-term neural networks. A corpus consisting of 1700 radiology reports including computed tomography (CT) and positron emission tomography/computed tomography (PET/CT) reports were used for development and testing. Five hundred one radiology reports from MIMIC-III Clinical Database version 1.4 was used for external validation. The pipeline achieved an overall F1 score of 0.94 on the internal set and 0.74 on external validation with the rule-based algorithm using expert input giving the best performance. Among the machine learning models, the Bi-LSTM_dropout model performed better than the ML model using XGBoost and the Bi-LSTM_simple model on internal set, whereas on external validation, the Bi-LSTM_simple model performed relatively better than other 2. This pipeline can be used for clinical concept-based classification of radiology reports related to lung carcinoma from a huge corpus and also for automated annotation of these reports.",

keywords = "Artificial intelligence (AI), natural language processing (NLP), Lung carcinoma, Deep learning, Big data analytics, Electronic Medical Records, Radiology report, Clinical concept extraction, Named entity recognition",

author = "Sneha Mithun and Jha, {Ashish Kumar} and Sherkhane, {Umesh B} and Vinay Jaiswar and Purandare, {Nilendu C} and Andre Dekker and Sander Puts and Inigo Bermejo and V Rangarajan and Zegers, {Catharina M L} and Leonard Wee",

note = "Publisher Copyright: {\textcopyright} 2023, The Author(s).",

year = "2023",

month = jun,

doi = "10.1007/s10278-023-00787-z",

language = "English",

volume = "36",

pages = "812--826",

journal = "Journal of Digital Imaging",

issn = "0897-1889",

publisher = "Springer",

number = "3",

}

TY - JOUR

T1 - Clinical Concept-Based Radiology Reports Classification Pipeline for Lung Carcinoma

AU - Mithun, Sneha

AU - Jha, Ashish Kumar

AU - Sherkhane, Umesh B

AU - Jaiswar, Vinay

AU - Purandare, Nilendu C

AU - Dekker, Andre

AU - Puts, Sander

AU - Bermejo, Inigo

AU - Rangarajan, V

AU - Zegers, Catharina M L

AU - Wee, Leonard

PY - 2023/6

Y1 - 2023/6

N2 - Rising incidence and mortality of cancer have led to an incremental amount of research in the field. To learn from preexisting data, it has become important to capture maximum information related to disease type, stage, treatment, and outcomes. Medical imaging reports are rich in this kind of information but are only present as free text. The extraction of information from such unstructured text reports is labor-intensive. The use of Natural Language Processing (NLP) tools to extract information from radiology reports can make it less time-consuming as well as more effective. In this study, we have developed and compared different models for the classification of lung carcinoma reports using clinical concepts. This study was approved by the institutional ethics committee as a retrospective study with a waiver of informed consent. A clinical concept-based classification pipeline for lung carcinoma radiology reports was developed using rule-based as well as machine learning models and compared. The machine learning models used were XGBoost and two more deep learning model architectures with bidirectional long short-term neural networks. A corpus consisting of 1700 radiology reports including computed tomography (CT) and positron emission tomography/computed tomography (PET/CT) reports were used for development and testing. Five hundred one radiology reports from MIMIC-III Clinical Database version 1.4 was used for external validation. The pipeline achieved an overall F1 score of 0.94 on the internal set and 0.74 on external validation with the rule-based algorithm using expert input giving the best performance. Among the machine learning models, the Bi-LSTM_dropout model performed better than the ML model using XGBoost and the Bi-LSTM_simple model on internal set, whereas on external validation, the Bi-LSTM_simple model performed relatively better than other 2. This pipeline can be used for clinical concept-based classification of radiology reports related to lung carcinoma from a huge corpus and also for automated annotation of these reports.

AB - Rising incidence and mortality of cancer have led to an incremental amount of research in the field. To learn from preexisting data, it has become important to capture maximum information related to disease type, stage, treatment, and outcomes. Medical imaging reports are rich in this kind of information but are only present as free text. The extraction of information from such unstructured text reports is labor-intensive. The use of Natural Language Processing (NLP) tools to extract information from radiology reports can make it less time-consuming as well as more effective. In this study, we have developed and compared different models for the classification of lung carcinoma reports using clinical concepts. This study was approved by the institutional ethics committee as a retrospective study with a waiver of informed consent. A clinical concept-based classification pipeline for lung carcinoma radiology reports was developed using rule-based as well as machine learning models and compared. The machine learning models used were XGBoost and two more deep learning model architectures with bidirectional long short-term neural networks. A corpus consisting of 1700 radiology reports including computed tomography (CT) and positron emission tomography/computed tomography (PET/CT) reports were used for development and testing. Five hundred one radiology reports from MIMIC-III Clinical Database version 1.4 was used for external validation. The pipeline achieved an overall F1 score of 0.94 on the internal set and 0.74 on external validation with the rule-based algorithm using expert input giving the best performance. Among the machine learning models, the Bi-LSTM_dropout model performed better than the ML model using XGBoost and the Bi-LSTM_simple model on internal set, whereas on external validation, the Bi-LSTM_simple model performed relatively better than other 2. This pipeline can be used for clinical concept-based classification of radiology reports related to lung carcinoma from a huge corpus and also for automated annotation of these reports.

KW - Artificial intelligence (AI)

KW - natural language processing (NLP)

KW - Lung carcinoma

KW - Deep learning

KW - Big data analytics

KW - Electronic Medical Records

KW - Radiology report

KW - Clinical concept extraction

KW - Named entity recognition

U2 - 10.1007/s10278-023-00787-z

DO - 10.1007/s10278-023-00787-z

M3 - Article

C2 - 36788196

SN - 0897-1889

VL - 36

SP - 812

EP - 826

JO - Journal of Digital Imaging

JF - Journal of Digital Imaging

IS - 3

ER -