Principal components analysis and the reported low intrinsic dimensionality of gene expression microarray data

Michael Lenz, Franz-Josef Müller, Martin Zenke, Andreas Schuppert

Research output: Contribution to journalArticleAcademicpeer-review

1 Citation (Scopus)

Abstract

Principal components analysis (PCA) is a common unsupervised method for the analysis of gene expression microarray data, providing information on the overall structure of the analyzed dataset. In the recent years, it has been applied to very large datasets involving many different tissues and cell types, in order to create a low dimensional global map of human gene expression. Here, we reevaluate this approach and show that the linear intrinsic dimensionality of this global map is higher than previously reported. Furthermore, we analyze in which cases PCA fails to detect biologically relevant information and point the reader to methods that overcome these limitations. Our results refine the current understanding of the overall structure of gene expression spaces and show that PCA critically depends on the effect size of the biological signal as well as on the fraction of samples containing this signal.

Original languageEnglish
Article number25696
Pages (from-to)25696
Number of pages11
JournalScientific Reports
Volume6
DOIs
Publication statusPublished - 2 Jun 2016

Keywords

  • CANCER

Cite this