Ensemble analyses improve signatures of tumour hypoxia and reveal inter-platform differences

Natalie S. Fox; Maud H. W. Starmans; Syed Haider; Philippe Lambin; Paul C. Boutros

doi:10.1186/1471-2105-15-170

Ensemble analyses improve signatures of tumour hypoxia and reveal inter-platform differences

Natalie S. Fox, Maud H. W. Starmans, Syed Haider, Philippe Lambin, Paul C. Boutros^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

Background: The reproducibility of transcriptomic biomarkers across datasets remains poor, limiting clinical application. We and others have suggested that this is in-part caused by differential error-structure between datasets, and their incomplete removal by pre-processing algorithms. Methods: To test this hypothesis, we systematically assessed the effects of pre-processing on biomarker classification using 24 different pre-processing methods and 15 distinct signatures of tumour hypoxia in 10 datasets (2,143 patients). Results: We confirm strong pre-processing effects for all datasets and signatures, and find that these differ between microarray versions. Importantly, exploiting different pre-processing techniques in an ensemble technique improved classification for a majority of signatures. Conclusions: Assessing biomarkers using an ensemble of pre-processing techniques shows clear value across multiple diseases, datasets and biomarkers. Importantly, ensemble classification improves biomarkers with initially good results but does not result in spuriously improved performance for poor biomarkers. While further research is required, this approach has the potential to become a standard for transcriptomic biomarkers.

Original language	English
Article number	170
Journal	BMC Bioinformatics
Volume	15
DOIs	https://doi.org/10.1186/1471-2105-15-170
Publication status	Published - 6 Jun 2014

Access to Document

10.1186/1471-2105-15-170Licence: CC BY

Cite this

@article{91093a82fd8c46458c908fd5358be248,

title = "Ensemble analyses improve signatures of tumour hypoxia and reveal inter-platform differences",

abstract = "Background: The reproducibility of transcriptomic biomarkers across datasets remains poor, limiting clinical application. We and others have suggested that this is in-part caused by differential error-structure between datasets, and their incomplete removal by pre-processing algorithms. Methods: To test this hypothesis, we systematically assessed the effects of pre-processing on biomarker classification using 24 different pre-processing methods and 15 distinct signatures of tumour hypoxia in 10 datasets (2,143 patients). Results: We confirm strong pre-processing effects for all datasets and signatures, and find that these differ between microarray versions. Importantly, exploiting different pre-processing techniques in an ensemble technique improved classification for a majority of signatures. Conclusions: Assessing biomarkers using an ensemble of pre-processing techniques shows clear value across multiple diseases, datasets and biomarkers. Importantly, ensemble classification improves biomarkers with initially good results but does not result in spuriously improved performance for poor biomarkers. While further research is required, this approach has the potential to become a standard for transcriptomic biomarkers.",

author = "Fox, {Natalie S.} and Starmans, {Maud H. W.} and Syed Haider and Philippe Lambin and Boutros, {Paul C.}",

year = "2014",

month = jun,

day = "6",

doi = "10.1186/1471-2105-15-170",

language = "English",

volume = "15",

journal = "BMC Bioinformatics",

issn = "1471-2105",

publisher = "BioMed Central Ltd",

}

TY - JOUR

T1 - Ensemble analyses improve signatures of tumour hypoxia and reveal inter-platform differences

AU - Fox, Natalie S.

AU - Starmans, Maud H. W.

AU - Haider, Syed

AU - Lambin, Philippe

AU - Boutros, Paul C.

PY - 2014/6/6

Y1 - 2014/6/6

N2 - Background: The reproducibility of transcriptomic biomarkers across datasets remains poor, limiting clinical application. We and others have suggested that this is in-part caused by differential error-structure between datasets, and their incomplete removal by pre-processing algorithms. Methods: To test this hypothesis, we systematically assessed the effects of pre-processing on biomarker classification using 24 different pre-processing methods and 15 distinct signatures of tumour hypoxia in 10 datasets (2,143 patients). Results: We confirm strong pre-processing effects for all datasets and signatures, and find that these differ between microarray versions. Importantly, exploiting different pre-processing techniques in an ensemble technique improved classification for a majority of signatures. Conclusions: Assessing biomarkers using an ensemble of pre-processing techniques shows clear value across multiple diseases, datasets and biomarkers. Importantly, ensemble classification improves biomarkers with initially good results but does not result in spuriously improved performance for poor biomarkers. While further research is required, this approach has the potential to become a standard for transcriptomic biomarkers.

AB - Background: The reproducibility of transcriptomic biomarkers across datasets remains poor, limiting clinical application. We and others have suggested that this is in-part caused by differential error-structure between datasets, and their incomplete removal by pre-processing algorithms. Methods: To test this hypothesis, we systematically assessed the effects of pre-processing on biomarker classification using 24 different pre-processing methods and 15 distinct signatures of tumour hypoxia in 10 datasets (2,143 patients). Results: We confirm strong pre-processing effects for all datasets and signatures, and find that these differ between microarray versions. Importantly, exploiting different pre-processing techniques in an ensemble technique improved classification for a majority of signatures. Conclusions: Assessing biomarkers using an ensemble of pre-processing techniques shows clear value across multiple diseases, datasets and biomarkers. Importantly, ensemble classification improves biomarkers with initially good results but does not result in spuriously improved performance for poor biomarkers. While further research is required, this approach has the potential to become a standard for transcriptomic biomarkers.

U2 - 10.1186/1471-2105-15-170

DO - 10.1186/1471-2105-15-170

M3 - Article

C2 - 24902696

SN - 1471-2105

VL - 15

JO - BMC Bioinformatics

JF - BMC Bioinformatics

M1 - 170

ER -