TY - JOUR
T1 - Evaluation of commonly used analysis strategies for epigenome- and transcriptome-wide association studies through replication of large-scale population studies
AU - van Rooij, Jeroen
AU - Mandaviya, Pooja R.
AU - Claringbould, Annique
AU - Felix, Janine F.
AU - van Dongen, Jenny
AU - Jansen, Rick
AU - Franke, Lude
AU - 't Hoen, Peter A. C.
AU - Heijmans, Bas
AU - van Meurs, Joyce B. J.
AU - BIOS Consortium
N1 - Publisher Copyright:
© 2019 The Author(s).
PY - 2019/11/14
Y1 - 2019/11/14
N2 - BackgroundA large number of analysis strategies are available for DNA methylation (DNAm) array and RNA-seq datasets, but it is unclear which strategies are best to use. We compare commonly used strategies and report how they influence results in large cohort studies.ResultsWe tested the associations of DNAm and RNA expression with age, BMI, and smoking in four different cohorts (n =similar to 2900). By comparing strategies against the base model on the number and percentage of replicated CpGs for DNAm analyses or genes for RNA-seq analyses in a leave-one-out cohort replication approach, we find the choice of the normalization method and statistical test does not strongly influence the results for DNAm array data. However, adjusting for cell counts or hidden confounders substantially decreases the number of replicated CpGs for age and increases the number of replicated CpGs for BMI and smoking. For RNA-seq data, the choice of the normalization method, gene expression inclusion threshold, and statistical test does not strongly influence the results. Including five principal components or excluding correction of technical covariates or cell counts decreases the number of replicated genes.ConclusionsResults were not influenced by the normalization method or statistical test. However, the correction method for cell counts, technical covariates, principal components, and/or hidden confounders does influence the results.
AB - BackgroundA large number of analysis strategies are available for DNA methylation (DNAm) array and RNA-seq datasets, but it is unclear which strategies are best to use. We compare commonly used strategies and report how they influence results in large cohort studies.ResultsWe tested the associations of DNAm and RNA expression with age, BMI, and smoking in four different cohorts (n =similar to 2900). By comparing strategies against the base model on the number and percentage of replicated CpGs for DNAm analyses or genes for RNA-seq analyses in a leave-one-out cohort replication approach, we find the choice of the normalization method and statistical test does not strongly influence the results for DNAm array data. However, adjusting for cell counts or hidden confounders substantially decreases the number of replicated CpGs for age and increases the number of replicated CpGs for BMI and smoking. For RNA-seq data, the choice of the normalization method, gene expression inclusion threshold, and statistical test does not strongly influence the results. Including five principal components or excluding correction of technical covariates or cell counts decreases the number of replicated genes.ConclusionsResults were not influenced by the normalization method or statistical test. However, the correction method for cell counts, technical covariates, principal components, and/or hidden confounders does influence the results.
KW - Illumina 450k arrays
KW - DNA methylation
KW - EWAS
KW - RNA sequencing
KW - Differential gene expression
KW - TWAS
KW - Statistical methods comparison
KW - DIFFERENTIAL DNA METHYLATION
KW - NORMALIZATION METHODS
KW - BIOCONDUCTOR PACKAGE
KW - SMOKING
KW - METAANALYSIS
KW - SIGNATURES
KW - EXPOSURE
KW - IMPACT
KW - TIME
UR - https://springernature.figshare.com/articles/dataset/MOESM1_of_Evaluation_of_commonly_used_analysis_strategies_for_epigenome-_and_transcriptome-wide_association_studies_through_replication_of_large-scale_population_studies/10309967/1
U2 - 10.1186/s13059-019-1878-x
DO - 10.1186/s13059-019-1878-x
M3 - Article
C2 - 31727104
SN - 1474-760X
VL - 20
JO - Genome Biology
JF - Genome Biology
IS - 1
M1 - 235
ER -