An evaluation of two-channel ChIP-on-chip and DNA methylation microarray normalization strategies

M.E. Michiel Adriaens; M.N.L. Jaillard; L.M.T. Eijssen; C.D. Mayer; C.T.A. Evelo

doi:10.1186/1471-2164-13-42

An evaluation of two-channel ChIP-on-chip and DNA methylation microarray normalization strategies

M.E. Michiel Adriaens^*, M.N.L. Jaillard, L.M.T. Eijssen, C.D. Mayer, C.T.A. Evelo

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

ABSTRACT: BACKGROUND: The combination of chromatin immunoprecipitation with two-channel microarray technology enables genome-wide mapping of binding sites of DNA-interacting proteins (ChIP-on-chip) or sites with methylated CpG di-nucleotides (DNA methylation microarray). These powerful tools are the gateway to understanding gene transcription regulation. Since the goals of such studies, the sample preparation procedures, the microarray content and study design are all different from transcriptomics microarrays, the data pre-processing strategies traditionally applied to transcriptomics microarrays may not be appropriate. Particularly, the main challenge of the normalization of "regulation microarrays" is (i) to make the data of individual microarrays quantitatively comparable and (ii) to keep the signals of the enriched probes, representing DNA sequences from the precipitate, as distinguishable as possible from the signals of the un-enriched probes, representing DNA sequences largely absent from the precipitate. Results: We compare several widely used normalization approaches (VSN, LOWESS, quantile, T-quantile, Tukey's biweight scaling, Peng's method) applied to a selection of regulation microarray datasets, ranging from DNA methylation to transcription factor binding and histone modification studies. Through comparison of the data distributions of control probes and gene promoter probes before and after normalization, and assessment of the power to identify known enriched genomic regions after normalization, we demonstrate that there are clear differences in performance between normalization procedures. Conclusion: T-quantile normalization applied separately on the channels and Tukey's biweight scaling outperform other methods in terms of the conservation of enriched and un-enriched signal separation, as well as in identification of genomic regions known to be enriched. T-quantile normalization is preferable as it additionally improves comparability between microarrays. In contrast, popular normalization approaches like quantile, LOWESS, Peng's method and VSN normalization alter the data distributions of regulation microarrays to such an extent that using these approaches will impact the reliability of the downstream analysis substantially.

Original language	English
Article number	42
Number of pages	14
Journal	BMC Genomics
Volume	13
Issue number	1
DOIs	https://doi.org/10.1186/1471-2164-13-42
Publication status	Published - 25 Jan 2012

Keywords

GENOME-WIDE ANALYSIS
DOSAGE COMPENSATION
X-CHROMOSOME
HUMAN CANCER
ARRAY DATA
MODEL
EXPRESSION
DISORDERS
IDENTIFY
GENES

Access to Document

10.1186/1471-2164-13-42Licence: CC BY

Cite this

@article{fa329524cf574d2a997e51e24370a73c,

title = "An evaluation of two-channel ChIP-on-chip and DNA methylation microarray normalization strategies",

abstract = "ABSTRACT: BACKGROUND: The combination of chromatin immunoprecipitation with two-channel microarray technology enables genome-wide mapping of binding sites of DNA-interacting proteins (ChIP-on-chip) or sites with methylated CpG di-nucleotides (DNA methylation microarray). These powerful tools are the gateway to understanding gene transcription regulation. Since the goals of such studies, the sample preparation procedures, the microarray content and study design are all different from transcriptomics microarrays, the data pre-processing strategies traditionally applied to transcriptomics microarrays may not be appropriate. Particularly, the main challenge of the normalization of {"}regulation microarrays{"} is (i) to make the data of individual microarrays quantitatively comparable and (ii) to keep the signals of the enriched probes, representing DNA sequences from the precipitate, as distinguishable as possible from the signals of the un-enriched probes, representing DNA sequences largely absent from the precipitate. Results: We compare several widely used normalization approaches (VSN, LOWESS, quantile, T-quantile, Tukey's biweight scaling, Peng's method) applied to a selection of regulation microarray datasets, ranging from DNA methylation to transcription factor binding and histone modification studies. Through comparison of the data distributions of control probes and gene promoter probes before and after normalization, and assessment of the power to identify known enriched genomic regions after normalization, we demonstrate that there are clear differences in performance between normalization procedures. Conclusion: T-quantile normalization applied separately on the channels and Tukey's biweight scaling outperform other methods in terms of the conservation of enriched and un-enriched signal separation, as well as in identification of genomic regions known to be enriched. T-quantile normalization is preferable as it additionally improves comparability between microarrays. In contrast, popular normalization approaches like quantile, LOWESS, Peng's method and VSN normalization alter the data distributions of regulation microarrays to such an extent that using these approaches will impact the reliability of the downstream analysis substantially.",

keywords = "GENOME-WIDE ANALYSIS, DOSAGE COMPENSATION, X-CHROMOSOME, HUMAN CANCER, ARRAY DATA, MODEL, EXPRESSION, DISORDERS, IDENTIFY, GENES",

author = "{Michiel Adriaens}, M.E. and M.N.L. Jaillard and L.M.T. Eijssen and C.D. Mayer and C.T.A. Evelo",

year = "2012",

month = jan,

day = "25",

doi = "10.1186/1471-2164-13-42",

language = "English",

volume = "13",

journal = "BMC Genomics",

issn = "1471-2164",

publisher = "BioMed Central Ltd",

number = "1",

}

TY - JOUR

T1 - An evaluation of two-channel ChIP-on-chip and DNA methylation microarray normalization strategies

AU - Michiel Adriaens, M.E.

AU - Jaillard, M.N.L.

AU - Eijssen, L.M.T.

AU - Mayer, C.D.

AU - Evelo, C.T.A.

PY - 2012/1/25

Y1 - 2012/1/25

N2 - ABSTRACT: BACKGROUND: The combination of chromatin immunoprecipitation with two-channel microarray technology enables genome-wide mapping of binding sites of DNA-interacting proteins (ChIP-on-chip) or sites with methylated CpG di-nucleotides (DNA methylation microarray). These powerful tools are the gateway to understanding gene transcription regulation. Since the goals of such studies, the sample preparation procedures, the microarray content and study design are all different from transcriptomics microarrays, the data pre-processing strategies traditionally applied to transcriptomics microarrays may not be appropriate. Particularly, the main challenge of the normalization of "regulation microarrays" is (i) to make the data of individual microarrays quantitatively comparable and (ii) to keep the signals of the enriched probes, representing DNA sequences from the precipitate, as distinguishable as possible from the signals of the un-enriched probes, representing DNA sequences largely absent from the precipitate. Results: We compare several widely used normalization approaches (VSN, LOWESS, quantile, T-quantile, Tukey's biweight scaling, Peng's method) applied to a selection of regulation microarray datasets, ranging from DNA methylation to transcription factor binding and histone modification studies. Through comparison of the data distributions of control probes and gene promoter probes before and after normalization, and assessment of the power to identify known enriched genomic regions after normalization, we demonstrate that there are clear differences in performance between normalization procedures. Conclusion: T-quantile normalization applied separately on the channels and Tukey's biweight scaling outperform other methods in terms of the conservation of enriched and un-enriched signal separation, as well as in identification of genomic regions known to be enriched. T-quantile normalization is preferable as it additionally improves comparability between microarrays. In contrast, popular normalization approaches like quantile, LOWESS, Peng's method and VSN normalization alter the data distributions of regulation microarrays to such an extent that using these approaches will impact the reliability of the downstream analysis substantially.

AB - ABSTRACT: BACKGROUND: The combination of chromatin immunoprecipitation with two-channel microarray technology enables genome-wide mapping of binding sites of DNA-interacting proteins (ChIP-on-chip) or sites with methylated CpG di-nucleotides (DNA methylation microarray). These powerful tools are the gateway to understanding gene transcription regulation. Since the goals of such studies, the sample preparation procedures, the microarray content and study design are all different from transcriptomics microarrays, the data pre-processing strategies traditionally applied to transcriptomics microarrays may not be appropriate. Particularly, the main challenge of the normalization of "regulation microarrays" is (i) to make the data of individual microarrays quantitatively comparable and (ii) to keep the signals of the enriched probes, representing DNA sequences from the precipitate, as distinguishable as possible from the signals of the un-enriched probes, representing DNA sequences largely absent from the precipitate. Results: We compare several widely used normalization approaches (VSN, LOWESS, quantile, T-quantile, Tukey's biweight scaling, Peng's method) applied to a selection of regulation microarray datasets, ranging from DNA methylation to transcription factor binding and histone modification studies. Through comparison of the data distributions of control probes and gene promoter probes before and after normalization, and assessment of the power to identify known enriched genomic regions after normalization, we demonstrate that there are clear differences in performance between normalization procedures. Conclusion: T-quantile normalization applied separately on the channels and Tukey's biweight scaling outperform other methods in terms of the conservation of enriched and un-enriched signal separation, as well as in identification of genomic regions known to be enriched. T-quantile normalization is preferable as it additionally improves comparability between microarrays. In contrast, popular normalization approaches like quantile, LOWESS, Peng's method and VSN normalization alter the data distributions of regulation microarrays to such an extent that using these approaches will impact the reliability of the downstream analysis substantially.

KW - GENOME-WIDE ANALYSIS

KW - DOSAGE COMPENSATION

KW - X-CHROMOSOME

KW - HUMAN CANCER

KW - ARRAY DATA

KW - MODEL

KW - EXPRESSION

KW - DISORDERS

KW - IDENTIFY

KW - GENES

U2 - 10.1186/1471-2164-13-42

DO - 10.1186/1471-2164-13-42

M3 - Article

SN - 1471-2164

VL - 13

JO - BMC Genomics

JF - BMC Genomics

IS - 1

M1 - 42

ER -