Comparison of Bayesian network and support vector machine models for two-year survival prediction in lung cancer patients treated with radiotherapy

K. Jayasurya; G. Fung; S. Yu; Cary Dehing-Oberije; D. De Ruysscher; Gena A. Hope; Wilfried De Neve; Yolande Lievens; P. Lambin; A. L. A. J. Dekker

doi:10.1118/1.3352709

Comparison of Bayesian network and support vector machine models for two-year survival prediction in lung cancer patients treated with radiotherapy

K. Jayasurya^*, G. Fung, S. Yu, Cary Dehing-Oberije, D. De Ruysscher, Gena A. Hope, Wilfried De Neve, Yolande Lievens, P. Lambin, A. L. A. J. Dekker

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

102 Downloads (Pure)

Abstract

Purpose: Classic statistical and machine learning models such as support vector machines (SVMs) can be used to predict cancer outcome, but often only perform well if all the input variables are known, which is unlikely in the medical domain. Bayesian network (BN) models have a natural ability to reason under uncertainty and might handle missing data better. In this study, the authors hypothesize that a BN model can predict two-year survival in non-small cell lung cancer (NSCLC) patients as accurately as SVM, but will predict survival more accurately when data are missing. Methods: A BN and SVM model were trained on 322 inoperable NSCLC patients treated with radiotherapy from Maastricht and validated in three independent data sets of 35, 47, and 33 patients from Ghent, Leuven, and Toronto. Missing variables occurred in the data set with only 37, 28, and 24 patients having a complete data set. Results: The BN model structure and parameter learning identified gross tumor volume size, performance status, and number of positive lymph nodes on a PET as prognostic factors for two-year survival. When validated in the full validation set of Ghent, Leuven, and Toronto, the BN model had an AUC of 0.77, 0.72, and 0.70, respectively. A SVM model based on the same variables had an overall worse performance (AUC 0.71, 0.68, and 0.69) especially in the Ghent set, which had the highest percentage of missing the important GTV size data. When only patients with complete data sets were considered, the BN and SVM model performed more alike. Conclusions: Within the limitations of this study, the hypothesis is supported that BN models are better at handling missing data than SVM models and are therefore more suitable for the medical domain. Future works have to focus on improving the BN performance by including more patients, more variables, and more diversity.

Original language	English
Pages (from-to)	1401-1407
Journal	Medical Physics
Volume	37
Issue number	4
DOIs	https://doi.org/10.1118/1.3352709
Publication status	Published - Apr 2010

Keywords

belief networks
cancer
learning (artificial intelligence)
lung
parameter estimation
positron emission tomography
radiation therapy
support vector machines
tumours

Access to Document

10.1118/1.3352709

Full TextFinal published version, 575 KBLicence: Taverne

Cite this

@article{e0a8cc1bdc614a87ad73fa4773d76afc,

title = "Comparison of Bayesian network and support vector machine models for two-year survival prediction in lung cancer patients treated with radiotherapy",

abstract = "Purpose: Classic statistical and machine learning models such as support vector machines (SVMs) can be used to predict cancer outcome, but often only perform well if all the input variables are known, which is unlikely in the medical domain. Bayesian network (BN) models have a natural ability to reason under uncertainty and might handle missing data better. In this study, the authors hypothesize that a BN model can predict two-year survival in non-small cell lung cancer (NSCLC) patients as accurately as SVM, but will predict survival more accurately when data are missing. Methods: A BN and SVM model were trained on 322 inoperable NSCLC patients treated with radiotherapy from Maastricht and validated in three independent data sets of 35, 47, and 33 patients from Ghent, Leuven, and Toronto. Missing variables occurred in the data set with only 37, 28, and 24 patients having a complete data set. Results: The BN model structure and parameter learning identified gross tumor volume size, performance status, and number of positive lymph nodes on a PET as prognostic factors for two-year survival. When validated in the full validation set of Ghent, Leuven, and Toronto, the BN model had an AUC of 0.77, 0.72, and 0.70, respectively. A SVM model based on the same variables had an overall worse performance (AUC 0.71, 0.68, and 0.69) especially in the Ghent set, which had the highest percentage of missing the important GTV size data. When only patients with complete data sets were considered, the BN and SVM model performed more alike. Conclusions: Within the limitations of this study, the hypothesis is supported that BN models are better at handling missing data than SVM models and are therefore more suitable for the medical domain. Future works have to focus on improving the BN performance by including more patients, more variables, and more diversity.",

keywords = "belief networks, cancer, learning (artificial intelligence), lung, parameter estimation, positron emission tomography, radiation therapy, support vector machines, tumours",

author = "K. Jayasurya and G. Fung and S. Yu and Cary Dehing-Oberije and {De Ruysscher}, D. and Hope, {Gena A.} and {De Neve}, Wilfried and Yolande Lievens and P. Lambin and Dekker, {A. L. A. J.}",

year = "2010",

month = apr,

doi = "10.1118/1.3352709",

language = "English",

volume = "37",

pages = "1401--1407",

journal = "Medical Physics",

issn = "0094-2405",

publisher = "Wiley",

number = "4",

}

TY - JOUR

T1 - Comparison of Bayesian network and support vector machine models for two-year survival prediction in lung cancer patients treated with radiotherapy

AU - Jayasurya, K.

AU - Fung, G.

AU - Yu, S.

AU - Dehing-Oberije, Cary

AU - De Ruysscher, D.

AU - Hope, Gena A.

AU - De Neve, Wilfried

AU - Lievens, Yolande

AU - Lambin, P.

AU - Dekker, A. L. A. J.

PY - 2010/4

Y1 - 2010/4

N2 - Purpose: Classic statistical and machine learning models such as support vector machines (SVMs) can be used to predict cancer outcome, but often only perform well if all the input variables are known, which is unlikely in the medical domain. Bayesian network (BN) models have a natural ability to reason under uncertainty and might handle missing data better. In this study, the authors hypothesize that a BN model can predict two-year survival in non-small cell lung cancer (NSCLC) patients as accurately as SVM, but will predict survival more accurately when data are missing. Methods: A BN and SVM model were trained on 322 inoperable NSCLC patients treated with radiotherapy from Maastricht and validated in three independent data sets of 35, 47, and 33 patients from Ghent, Leuven, and Toronto. Missing variables occurred in the data set with only 37, 28, and 24 patients having a complete data set. Results: The BN model structure and parameter learning identified gross tumor volume size, performance status, and number of positive lymph nodes on a PET as prognostic factors for two-year survival. When validated in the full validation set of Ghent, Leuven, and Toronto, the BN model had an AUC of 0.77, 0.72, and 0.70, respectively. A SVM model based on the same variables had an overall worse performance (AUC 0.71, 0.68, and 0.69) especially in the Ghent set, which had the highest percentage of missing the important GTV size data. When only patients with complete data sets were considered, the BN and SVM model performed more alike. Conclusions: Within the limitations of this study, the hypothesis is supported that BN models are better at handling missing data than SVM models and are therefore more suitable for the medical domain. Future works have to focus on improving the BN performance by including more patients, more variables, and more diversity.

AB - Purpose: Classic statistical and machine learning models such as support vector machines (SVMs) can be used to predict cancer outcome, but often only perform well if all the input variables are known, which is unlikely in the medical domain. Bayesian network (BN) models have a natural ability to reason under uncertainty and might handle missing data better. In this study, the authors hypothesize that a BN model can predict two-year survival in non-small cell lung cancer (NSCLC) patients as accurately as SVM, but will predict survival more accurately when data are missing. Methods: A BN and SVM model were trained on 322 inoperable NSCLC patients treated with radiotherapy from Maastricht and validated in three independent data sets of 35, 47, and 33 patients from Ghent, Leuven, and Toronto. Missing variables occurred in the data set with only 37, 28, and 24 patients having a complete data set. Results: The BN model structure and parameter learning identified gross tumor volume size, performance status, and number of positive lymph nodes on a PET as prognostic factors for two-year survival. When validated in the full validation set of Ghent, Leuven, and Toronto, the BN model had an AUC of 0.77, 0.72, and 0.70, respectively. A SVM model based on the same variables had an overall worse performance (AUC 0.71, 0.68, and 0.69) especially in the Ghent set, which had the highest percentage of missing the important GTV size data. When only patients with complete data sets were considered, the BN and SVM model performed more alike. Conclusions: Within the limitations of this study, the hypothesis is supported that BN models are better at handling missing data than SVM models and are therefore more suitable for the medical domain. Future works have to focus on improving the BN performance by including more patients, more variables, and more diversity.

KW - belief networks

KW - cancer

KW - learning (artificial intelligence)

KW - lung

KW - parameter estimation

KW - positron emission tomography

KW - radiation therapy

KW - support vector machines

KW - tumours

U2 - 10.1118/1.3352709

DO - 10.1118/1.3352709

M3 - Article

C2 - 20443461

SN - 0094-2405

VL - 37

SP - 1401

EP - 1407

JO - Medical Physics

JF - Medical Physics

IS - 4

ER -