T-staging pulmonary oncology from radiological reports using natural language processing: translating into a multi-language setting

J.M. Nobel; S. Puts; J. Weiss; H.J.W.L. Aerts; R.H. Mak; S.G.F. Robben; A.L.A.J. Dekker

doi:10.1186/s13244-021-01018-1

T-staging pulmonary oncology from radiological reports using natural language processing: translating into a multi-language setting

J.M. Nobel^*, S. Puts, J. Weiss, H.J.W.L. Aerts, R.H. Mak, S.G.F. Robben, A.L.A.J. Dekker

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

Background In the era of datafication, it is important that medical data are accurate and structured for multiple applications. Especially data for oncological staging need to be accurate to stage and treat a patient, as well as population-level surveillance and outcome assessment. To support data extraction from free-text radiological reports, Dutch natural language processing (NLP) algorithm was built to quantify T-stage of pulmonary tumors according to the tumor node metastasis (TNM) classification. This structuring tool was translated and validated on English radiological free-text reports. A rule-based algorithm to classify T-stage was trained and validated on, respectively, 200 and 225 English free-text radiological reports from diagnostic computed tomography (CT) obtained for staging of patients with lung cancer. The automated T-stage extracted by the algorithm from the report was compared to manual staging. A graphical user interface was built for training purposes to visualize the results of the algorithm by highlighting the extracted concepts and its modifying context. Results Accuracy of the T-stage classifier was 0.89 in the validation set, 0.84 when considering the T-substages, and 0.76 when only considering tumor size. Results were comparable with the Dutch results (respectively, 0.88, 0.89 and 0.79). Most errors were made due to ambiguity issues that could not be solved by the rule-based nature of the algorithm. Conclusions NLP can be successfully applied for staging lung cancer from free-text radiological reports in different languages. Focused introduction of machine learning should be introduced in a hybrid approach to improve performance.

Original language	English
Article number	77
Number of pages	11
Journal	Insights into Imaging
Volume	12
Issue number	1
DOIs	https://doi.org/10.1186/s13244-021-01018-1
Publication status	Published - 10 Jun 2021

Keywords

Radiology
Reporting
Natural language processing
Free-text
Classification system
INFORMATION

Access to Document

10.1186/s13244-021-01018-1Licence: CC BY

Cite this

@article{47e246ac86a84d59a5793a7f5829f5f6,

title = "T-staging pulmonary oncology from radiological reports using natural language processing: translating into a multi-language setting",

abstract = "Background In the era of datafication, it is important that medical data are accurate and structured for multiple applications. Especially data for oncological staging need to be accurate to stage and treat a patient, as well as population-level surveillance and outcome assessment. To support data extraction from free-text radiological reports, Dutch natural language processing (NLP) algorithm was built to quantify T-stage of pulmonary tumors according to the tumor node metastasis (TNM) classification. This structuring tool was translated and validated on English radiological free-text reports. A rule-based algorithm to classify T-stage was trained and validated on, respectively, 200 and 225 English free-text radiological reports from diagnostic computed tomography (CT) obtained for staging of patients with lung cancer. The automated T-stage extracted by the algorithm from the report was compared to manual staging. A graphical user interface was built for training purposes to visualize the results of the algorithm by highlighting the extracted concepts and its modifying context. Results Accuracy of the T-stage classifier was 0.89 in the validation set, 0.84 when considering the T-substages, and 0.76 when only considering tumor size. Results were comparable with the Dutch results (respectively, 0.88, 0.89 and 0.79). Most errors were made due to ambiguity issues that could not be solved by the rule-based nature of the algorithm. Conclusions NLP can be successfully applied for staging lung cancer from free-text radiological reports in different languages. Focused introduction of machine learning should be introduced in a hybrid approach to improve performance.",

keywords = "Radiology, Reporting, Natural language processing, Free-text, Classification system, INFORMATION",

author = "J.M. Nobel and S. Puts and J. Weiss and H.J.W.L. Aerts and R.H. Mak and S.G.F. Robben and A.L.A.J. Dekker",

year = "2021",

month = jun,

day = "10",

doi = "10.1186/s13244-021-01018-1",

language = "English",

volume = "12",

journal = "Insights into Imaging",

issn = "1869-4101",

publisher = "SpringerOpen",

number = "1",

}

TY - JOUR

T1 - T-staging pulmonary oncology from radiological reports using natural language processing: translating into a multi-language setting

AU - Nobel, J.M.

AU - Puts, S.

AU - Weiss, J.

AU - Aerts, H.J.W.L.

AU - Mak, R.H.

AU - Robben, S.G.F.

AU - Dekker, A.L.A.J.

PY - 2021/6/10

Y1 - 2021/6/10

N2 - Background In the era of datafication, it is important that medical data are accurate and structured for multiple applications. Especially data for oncological staging need to be accurate to stage and treat a patient, as well as population-level surveillance and outcome assessment. To support data extraction from free-text radiological reports, Dutch natural language processing (NLP) algorithm was built to quantify T-stage of pulmonary tumors according to the tumor node metastasis (TNM) classification. This structuring tool was translated and validated on English radiological free-text reports. A rule-based algorithm to classify T-stage was trained and validated on, respectively, 200 and 225 English free-text radiological reports from diagnostic computed tomography (CT) obtained for staging of patients with lung cancer. The automated T-stage extracted by the algorithm from the report was compared to manual staging. A graphical user interface was built for training purposes to visualize the results of the algorithm by highlighting the extracted concepts and its modifying context. Results Accuracy of the T-stage classifier was 0.89 in the validation set, 0.84 when considering the T-substages, and 0.76 when only considering tumor size. Results were comparable with the Dutch results (respectively, 0.88, 0.89 and 0.79). Most errors were made due to ambiguity issues that could not be solved by the rule-based nature of the algorithm. Conclusions NLP can be successfully applied for staging lung cancer from free-text radiological reports in different languages. Focused introduction of machine learning should be introduced in a hybrid approach to improve performance.

AB - Background In the era of datafication, it is important that medical data are accurate and structured for multiple applications. Especially data for oncological staging need to be accurate to stage and treat a patient, as well as population-level surveillance and outcome assessment. To support data extraction from free-text radiological reports, Dutch natural language processing (NLP) algorithm was built to quantify T-stage of pulmonary tumors according to the tumor node metastasis (TNM) classification. This structuring tool was translated and validated on English radiological free-text reports. A rule-based algorithm to classify T-stage was trained and validated on, respectively, 200 and 225 English free-text radiological reports from diagnostic computed tomography (CT) obtained for staging of patients with lung cancer. The automated T-stage extracted by the algorithm from the report was compared to manual staging. A graphical user interface was built for training purposes to visualize the results of the algorithm by highlighting the extracted concepts and its modifying context. Results Accuracy of the T-stage classifier was 0.89 in the validation set, 0.84 when considering the T-substages, and 0.76 when only considering tumor size. Results were comparable with the Dutch results (respectively, 0.88, 0.89 and 0.79). Most errors were made due to ambiguity issues that could not be solved by the rule-based nature of the algorithm. Conclusions NLP can be successfully applied for staging lung cancer from free-text radiological reports in different languages. Focused introduction of machine learning should be introduced in a hybrid approach to improve performance.

KW - Radiology

KW - Reporting

KW - Natural language processing

KW - Free-text

KW - Classification system

KW - INFORMATION

U2 - 10.1186/s13244-021-01018-1

DO - 10.1186/s13244-021-01018-1

M3 - Article

C2 - 34114076

SN - 1869-4101

VL - 12

JO - Insights into Imaging

JF - Insights into Imaging

IS - 1

M1 - 77

ER -