Whole Genome Prediction of Bladder Cancer Risk With the Bayesian LASSO

Evangelina Lopez de Maturana; Stephen J. Chanok; Antoni C. Picornell; Nathaniel Rothman; Jesus Herranz; M. Luz Calle; Montserrat Garcia-Closas; Gaelle Marenne; Angela Brand; Adonina Tardon; Alfredo Carrato; Debra T. Silverman; Manolis Kogevinas; Daniel Gianola; Francisco X. Real; Nuria Malats

doi:10.1002/gepi.21809

Whole Genome Prediction of Bladder Cancer Risk With the Bayesian LASSO

Evangelina Lopez de Maturana, Stephen J. Chanok, Antoni C. Picornell, Nathaniel Rothman, Jesus Herranz, M. Luz Calle, Montserrat Garcia-Closas, Gaelle Marenne, Angela Brand, Adonina Tardon, Alfredo Carrato, Debra T. Silverman, Manolis Kogevinas, Daniel Gianola, Francisco X. Real, Nuria Malats^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

To build a predictive model for urothelial carcinoma of the bladder (UCB) risk combining both genomic and nongenomic data, 1,127 cases and 1,090 controls from the Spanish Bladder Cancer/EPICURO study were genotyped using the HumanHap 1M SNP array. After quality control filters, genotypes from 475,290 variants were available. Nongenomic information comprised age, gender, region, and smoking status. Three Bayesian threshold models were implemented including: (1) only genomic information, (2) only nongenomic data, and (3) both sources of information. The three models were applied to the whole population, to only nonsmokers, to male smokers, and to extreme phenotypes to potentiate the UCB genetic component. The area under the ROC curve allowed evaluating the predictive ability of each model in a 10-fold cross-validation scenario. Smoking status showed the highest predictive ability of UCB risk (AUC(test) = 0.62). On the other hand, the AUC of all genetic variants was poorer (0.53). When the extreme phenotype approach was applied, the predictive ability of the genomic model improved 15%. This study represents a first attempt to build a predictive model for UCB risk combining both genomic and nongenomic data and applying state-of-the-art statistical approaches. However, the lack of genetic relatedness among individuals, the complexity of UCB etiology, as well as a relatively small statistical power, may explain the low predictive ability for UCB risk. The study confirms the difficulty of predicting complex diseases using genetic data, and suggests the limited translational potential of findings from this type of data into public health interventions. Genet Epidemiol 38: 467-476, 2014.

Original language	English
Pages (from-to)	467-476
Journal	Genetic Epidemiology
Volume	38
Issue number	5
DOIs	https://doi.org/10.1002/gepi.21809
Publication status	Published - Jul 2014

Keywords

Bayesian shrinkage method
area under the ROC curve
urothelial carcinoma of the bladder
genomic predictive model

Access to Document

10.1002/gepi.21809

Cite this

Lopez de Maturana, E., Chanok, S. J., Picornell, A. C., Rothman, N., Herranz, J., Luz Calle, M., Garcia-Closas, M., Marenne, G., Brand, A., Tardon, A., Carrato, A., Silverman, D. T., Kogevinas, M., Gianola, D., Real, F. X., & Malats, N. (2014). Whole Genome Prediction of Bladder Cancer Risk With the Bayesian LASSO. Genetic Epidemiology, 38(5), 467-476. https://doi.org/10.1002/gepi.21809

@article{451c5caffa3c4ed9ae2dd4948d9b8da4,

title = "Whole Genome Prediction of Bladder Cancer Risk With the Bayesian LASSO",

abstract = "To build a predictive model for urothelial carcinoma of the bladder (UCB) risk combining both genomic and nongenomic data, 1,127 cases and 1,090 controls from the Spanish Bladder Cancer/EPICURO study were genotyped using the HumanHap 1M SNP array. After quality control filters, genotypes from 475,290 variants were available. Nongenomic information comprised age, gender, region, and smoking status. Three Bayesian threshold models were implemented including: (1) only genomic information, (2) only nongenomic data, and (3) both sources of information. The three models were applied to the whole population, to only nonsmokers, to male smokers, and to extreme phenotypes to potentiate the UCB genetic component. The area under the ROC curve allowed evaluating the predictive ability of each model in a 10-fold cross-validation scenario. Smoking status showed the highest predictive ability of UCB risk (AUC(test) = 0.62). On the other hand, the AUC of all genetic variants was poorer (0.53). When the extreme phenotype approach was applied, the predictive ability of the genomic model improved 15%. This study represents a first attempt to build a predictive model for UCB risk combining both genomic and nongenomic data and applying state-of-the-art statistical approaches. However, the lack of genetic relatedness among individuals, the complexity of UCB etiology, as well as a relatively small statistical power, may explain the low predictive ability for UCB risk. The study confirms the difficulty of predicting complex diseases using genetic data, and suggests the limited translational potential of findings from this type of data into public health interventions. Genet Epidemiol 38: 467-476, 2014. ",

keywords = "Bayesian shrinkage method, area under the ROC curve, urothelial carcinoma of the bladder, genomic predictive model",

author = "{Lopez de Maturana}, Evangelina and Chanok, {Stephen J.} and Picornell, {Antoni C.} and Nathaniel Rothman and Jesus Herranz and {Luz Calle}, M. and Montserrat Garcia-Closas and Gaelle Marenne and Angela Brand and Adonina Tardon and Alfredo Carrato and Silverman, {Debra T.} and Manolis Kogevinas and Daniel Gianola and Real, {Francisco X.} and Nuria Malats",

year = "2014",

month = jul,

doi = "10.1002/gepi.21809",

language = "English",

volume = "38",

pages = "467--476",

journal = "Genetic Epidemiology",

issn = "0741-0395",

publisher = "Wiley-Blackwell",

number = "5",

}

Lopez de Maturana, E, Chanok, SJ, Picornell, AC, Rothman, N, Herranz, J, Luz Calle, M, Garcia-Closas, M, Marenne, G, Brand, A, Tardon, A, Carrato, A, Silverman, DT, Kogevinas, M, Gianola, D, Real, FX & Malats, N 2014, 'Whole Genome Prediction of Bladder Cancer Risk With the Bayesian LASSO', Genetic Epidemiology, vol. 38, no. 5, pp. 467-476. https://doi.org/10.1002/gepi.21809

TY - JOUR

T1 - Whole Genome Prediction of Bladder Cancer Risk With the Bayesian LASSO

AU - Lopez de Maturana, Evangelina

AU - Chanok, Stephen J.

AU - Picornell, Antoni C.

AU - Rothman, Nathaniel

AU - Herranz, Jesus

AU - Luz Calle, M.

AU - Garcia-Closas, Montserrat

AU - Marenne, Gaelle

AU - Brand, Angela

AU - Tardon, Adonina

AU - Carrato, Alfredo

AU - Silverman, Debra T.

AU - Kogevinas, Manolis

AU - Gianola, Daniel

AU - Real, Francisco X.

AU - Malats, Nuria

PY - 2014/7

Y1 - 2014/7

N2 - To build a predictive model for urothelial carcinoma of the bladder (UCB) risk combining both genomic and nongenomic data, 1,127 cases and 1,090 controls from the Spanish Bladder Cancer/EPICURO study were genotyped using the HumanHap 1M SNP array. After quality control filters, genotypes from 475,290 variants were available. Nongenomic information comprised age, gender, region, and smoking status. Three Bayesian threshold models were implemented including: (1) only genomic information, (2) only nongenomic data, and (3) both sources of information. The three models were applied to the whole population, to only nonsmokers, to male smokers, and to extreme phenotypes to potentiate the UCB genetic component. The area under the ROC curve allowed evaluating the predictive ability of each model in a 10-fold cross-validation scenario. Smoking status showed the highest predictive ability of UCB risk (AUC(test) = 0.62). On the other hand, the AUC of all genetic variants was poorer (0.53). When the extreme phenotype approach was applied, the predictive ability of the genomic model improved 15%. This study represents a first attempt to build a predictive model for UCB risk combining both genomic and nongenomic data and applying state-of-the-art statistical approaches. However, the lack of genetic relatedness among individuals, the complexity of UCB etiology, as well as a relatively small statistical power, may explain the low predictive ability for UCB risk. The study confirms the difficulty of predicting complex diseases using genetic data, and suggests the limited translational potential of findings from this type of data into public health interventions. Genet Epidemiol 38: 467-476, 2014.

AB - To build a predictive model for urothelial carcinoma of the bladder (UCB) risk combining both genomic and nongenomic data, 1,127 cases and 1,090 controls from the Spanish Bladder Cancer/EPICURO study were genotyped using the HumanHap 1M SNP array. After quality control filters, genotypes from 475,290 variants were available. Nongenomic information comprised age, gender, region, and smoking status. Three Bayesian threshold models were implemented including: (1) only genomic information, (2) only nongenomic data, and (3) both sources of information. The three models were applied to the whole population, to only nonsmokers, to male smokers, and to extreme phenotypes to potentiate the UCB genetic component. The area under the ROC curve allowed evaluating the predictive ability of each model in a 10-fold cross-validation scenario. Smoking status showed the highest predictive ability of UCB risk (AUC(test) = 0.62). On the other hand, the AUC of all genetic variants was poorer (0.53). When the extreme phenotype approach was applied, the predictive ability of the genomic model improved 15%. This study represents a first attempt to build a predictive model for UCB risk combining both genomic and nongenomic data and applying state-of-the-art statistical approaches. However, the lack of genetic relatedness among individuals, the complexity of UCB etiology, as well as a relatively small statistical power, may explain the low predictive ability for UCB risk. The study confirms the difficulty of predicting complex diseases using genetic data, and suggests the limited translational potential of findings from this type of data into public health interventions. Genet Epidemiol 38: 467-476, 2014.

KW - Bayesian shrinkage method

KW - area under the ROC curve

KW - urothelial carcinoma of the bladder

KW - genomic predictive model

U2 - 10.1002/gepi.21809

DO - 10.1002/gepi.21809

M3 - Article

C2 - 24796258

SN - 0741-0395

VL - 38

SP - 467

EP - 476

JO - Genetic Epidemiology

JF - Genetic Epidemiology

IS - 5

ER -