Research output

Bayesian negative binomial mixture regression models for the analysis of sequence count and methylation data

Research output: Contribution to journalArticleAcademicpeer-review

Standard

Bayesian negative binomial mixture regression models for the analysis of sequence count and methylation data. / Li, Qiwei; Cassese, Alberto; Guindani, Michele; Vannucci, Marina.

In: Biometrics, Vol. 75, No. 1, 03.2019, p. 183-192.

Research output: Contribution to journalArticleAcademicpeer-review

Harvard

APA

Vancouver

Author

Li, Qiwei ; Cassese, Alberto ; Guindani, Michele ; Vannucci, Marina. / Bayesian negative binomial mixture regression models for the analysis of sequence count and methylation data. In: Biometrics. 2019 ; Vol. 75, No. 1. pp. 183-192.

Bibtex

@article{efba6bbf03d641108d23390835b01e54,
title = "Bayesian negative binomial mixture regression models for the analysis of sequence count and methylation data",
abstract = "In this article, we develop a Bayesian hierarchical mixture regression model for studying the association between a multivariate response, measured as counts on a set of features, and a set of covariates. We have available RNA-Seq and DNA methylation data measured on breast cancer patients at different stages of the disease. We account for the heterogeneity and over-dispersion of count data (here, RNA-Seq data) by considering a mixture of negative binomial distributions and incorporate the covariates (here, methylation data) into the model via a linear modeling construction on the mean components. Our modeling construction includes several innovative characteristics. First, it employs selection techniques that allow the identification of a small subset of features that best discriminate the samples while simultaneously selecting a set of covariates associated to each feature. Second, it incorporates known dependencies into the feature selection process via the use of Markov random field (MRF) priors. On simulated data, we show how incorporating existing information via the prior model can improve the accuracy of feature selection. In the analysis of RNA-Seq and DNA methylation data on breast cancer, we incorporate knowledge on relationships among genes via a gene-gene network, which we extract from the KEGG database. Our data analysis identifies genes which are discriminatory of cancer stages and simultaneously selects significant associations between those genes and DNA methylation sites. A biological interpretation of our findings reveals several biomarkers that can help understanding the effect of DNA methylation on gene expression transcription across cancer stages.",
keywords = "Count data, DIFFERENTIAL EXPRESSION ANALYSIS, Feature selection, GENE-EXPRESSION, Integrative analysis, Markov random field, Mixture models, Negative binomial, RNA-SEQ, VARIABLE SELECTION",
author = "Qiwei Li and Alberto Cassese and Michele Guindani and Marina Vannucci",
note = "This article is protected by copyright. All rights reserved.",
year = "2019",
month = "3",
doi = "10.1111/biom.12962",
language = "English",
volume = "75",
pages = "183--192",
journal = "Biometrics",
issn = "0006-341X",
publisher = "Wiley-Blackwell",
number = "1",

}

RIS

TY - JOUR

T1 - Bayesian negative binomial mixture regression models for the analysis of sequence count and methylation data

AU - Li, Qiwei

AU - Cassese, Alberto

AU - Guindani, Michele

AU - Vannucci, Marina

N1 - This article is protected by copyright. All rights reserved.

PY - 2019/3

Y1 - 2019/3

N2 - In this article, we develop a Bayesian hierarchical mixture regression model for studying the association between a multivariate response, measured as counts on a set of features, and a set of covariates. We have available RNA-Seq and DNA methylation data measured on breast cancer patients at different stages of the disease. We account for the heterogeneity and over-dispersion of count data (here, RNA-Seq data) by considering a mixture of negative binomial distributions and incorporate the covariates (here, methylation data) into the model via a linear modeling construction on the mean components. Our modeling construction includes several innovative characteristics. First, it employs selection techniques that allow the identification of a small subset of features that best discriminate the samples while simultaneously selecting a set of covariates associated to each feature. Second, it incorporates known dependencies into the feature selection process via the use of Markov random field (MRF) priors. On simulated data, we show how incorporating existing information via the prior model can improve the accuracy of feature selection. In the analysis of RNA-Seq and DNA methylation data on breast cancer, we incorporate knowledge on relationships among genes via a gene-gene network, which we extract from the KEGG database. Our data analysis identifies genes which are discriminatory of cancer stages and simultaneously selects significant associations between those genes and DNA methylation sites. A biological interpretation of our findings reveals several biomarkers that can help understanding the effect of DNA methylation on gene expression transcription across cancer stages.

AB - In this article, we develop a Bayesian hierarchical mixture regression model for studying the association between a multivariate response, measured as counts on a set of features, and a set of covariates. We have available RNA-Seq and DNA methylation data measured on breast cancer patients at different stages of the disease. We account for the heterogeneity and over-dispersion of count data (here, RNA-Seq data) by considering a mixture of negative binomial distributions and incorporate the covariates (here, methylation data) into the model via a linear modeling construction on the mean components. Our modeling construction includes several innovative characteristics. First, it employs selection techniques that allow the identification of a small subset of features that best discriminate the samples while simultaneously selecting a set of covariates associated to each feature. Second, it incorporates known dependencies into the feature selection process via the use of Markov random field (MRF) priors. On simulated data, we show how incorporating existing information via the prior model can improve the accuracy of feature selection. In the analysis of RNA-Seq and DNA methylation data on breast cancer, we incorporate knowledge on relationships among genes via a gene-gene network, which we extract from the KEGG database. Our data analysis identifies genes which are discriminatory of cancer stages and simultaneously selects significant associations between those genes and DNA methylation sites. A biological interpretation of our findings reveals several biomarkers that can help understanding the effect of DNA methylation on gene expression transcription across cancer stages.

KW - Count data

KW - DIFFERENTIAL EXPRESSION ANALYSIS

KW - Feature selection

KW - GENE-EXPRESSION

KW - Integrative analysis

KW - Markov random field

KW - Mixture models

KW - Negative binomial

KW - RNA-SEQ

KW - VARIABLE SELECTION

U2 - 10.1111/biom.12962

DO - 10.1111/biom.12962

M3 - Article

VL - 75

SP - 183

EP - 192

JO - Biometrics

T2 - Biometrics

JF - Biometrics

SN - 0006-341X

IS - 1

ER -