Pattern recognition methods to relate time profiles of gene expression with phenotypic data: a comparative study

Diana M. Hendrickx*, Danyel G. J. Jennen, Jacob J. Briede, Rachel Cavill, Theo M. de Kok, Jos C. S. Kleinjans

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

517 Downloads (Pure)

Abstract

Motivation: Comparing time courses of gene expression with time courses of phenotypic data may provide new insights in cellular mechanisms. In this study, we compared the performance of five pattern recognition methods with respect to their ability to relate genes and phenotypic data: one classical method (k-means) and four methods especially developed for time series [Short Time-series Expression Miner (STEM), Linear Mixed Model mixtures, Dynamic Time Warping for -Omics and linear modeling with R/Bioconductor limma package]. The methods were evaluated using data available from toxicological studies that had the aim to relate gene expression with phenotypic end-points (i.e. to develop biomarkers for adverse outcomes). Additionally, technical aspects (influence of noise, number of time points and number of replicates) were evaluated on simulated data. Results: None of the methods outperforms the others in terms of biology. Linear modeling with limma is mostly influenced by noise. STEM is mostly influenced by the number of biological replicates in the dataset, whereas k-means and linear modeling with limma are mostly influenced by the number of time points. In most cases, the results of the methods complement each other. We therefore provide recommendations to integrate the five methods.
Original languageEnglish
Pages (from-to)2115-2122
JournalBioinformatics
Volume31
Issue number13
DOIs
Publication statusPublished - 1 Jul 2015

Cite this