Pattern recognition methods to relate time profiles of gene expression with phenotypic data: a comparative study

Diana M. Hendrickx; Danyel G. J. Jennen; Jacob J. Briede; Rachel Cavill; Theo M. de Kok; Jos C. S. Kleinjans

doi:10.1093/bioinformatics/btv108

Pattern recognition methods to relate time profiles of gene expression with phenotypic data: a comparative study

Diana M. Hendrickx^*, Danyel G. J. Jennen, Jacob J. Briede, Rachel Cavill, Theo M. de Kok, Jos C. S. Kleinjans

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

564 Downloads (Pure)

Abstract

Motivation: Comparing time courses of gene expression with time courses of phenotypic data may provide new insights in cellular mechanisms. In this study, we compared the performance of five pattern recognition methods with respect to their ability to relate genes and phenotypic data: one classical method (k-means) and four methods especially developed for time series [Short Time-series Expression Miner (STEM), Linear Mixed Model mixtures, Dynamic Time Warping for -Omics and linear modeling with R/Bioconductor limma package]. The methods were evaluated using data available from toxicological studies that had the aim to relate gene expression with phenotypic end-points (i.e. to develop biomarkers for adverse outcomes). Additionally, technical aspects (influence of noise, number of time points and number of replicates) were evaluated on simulated data. Results: None of the methods outperforms the others in terms of biology. Linear modeling with limma is mostly influenced by noise. STEM is mostly influenced by the number of biological replicates in the dataset, whereas k-means and linear modeling with limma are mostly influenced by the number of time points. In most cases, the results of the methods complement each other. We therefore provide recommendations to integrate the five methods.

Original language	English
Pages (from-to)	2115-2122
Journal	Bioinformatics
Volume	31
Issue number	13
DOIs	https://doi.org/10.1093/bioinformatics/btv108
Publication status	Published - 1 Jul 2015

Access to Document

10.1093/bioinformatics/btv108

Full TimeFinal published version, 467 KBLicence: Taverne

Cite this

@article{30c7a95e83e649eeb5f290f261c2cab8,

title = "Pattern recognition methods to relate time profiles of gene expression with phenotypic data: a comparative study",

abstract = "Motivation: Comparing time courses of gene expression with time courses of phenotypic data may provide new insights in cellular mechanisms. In this study, we compared the performance of five pattern recognition methods with respect to their ability to relate genes and phenotypic data: one classical method (k-means) and four methods especially developed for time series [Short Time-series Expression Miner (STEM), Linear Mixed Model mixtures, Dynamic Time Warping for -Omics and linear modeling with R/Bioconductor limma package]. The methods were evaluated using data available from toxicological studies that had the aim to relate gene expression with phenotypic end-points (i.e. to develop biomarkers for adverse outcomes). Additionally, technical aspects (influence of noise, number of time points and number of replicates) were evaluated on simulated data. Results: None of the methods outperforms the others in terms of biology. Linear modeling with limma is mostly influenced by noise. STEM is mostly influenced by the number of biological replicates in the dataset, whereas k-means and linear modeling with limma are mostly influenced by the number of time points. In most cases, the results of the methods complement each other. We therefore provide recommendations to integrate the five methods.",

author = "Hendrickx, {Diana M.} and Jennen, {Danyel G. J.} and Briede, {Jacob J.} and Rachel Cavill and {de Kok}, {Theo M.} and Kleinjans, {Jos C. S.}",

year = "2015",

month = jul,

day = "1",

doi = "10.1093/bioinformatics/btv108",

language = "English",

volume = "31",

pages = "2115--2122",

journal = "Bioinformatics",

issn = "1367-4803",

publisher = "Oxford University Press",

number = "13",

}

TY - JOUR

T1 - Pattern recognition methods to relate time profiles of gene expression with phenotypic data: a comparative study

AU - Hendrickx, Diana M.

AU - Jennen, Danyel G. J.

AU - Briede, Jacob J.

AU - Cavill, Rachel

AU - de Kok, Theo M.

AU - Kleinjans, Jos C. S.

PY - 2015/7/1

Y1 - 2015/7/1

N2 - Motivation: Comparing time courses of gene expression with time courses of phenotypic data may provide new insights in cellular mechanisms. In this study, we compared the performance of five pattern recognition methods with respect to their ability to relate genes and phenotypic data: one classical method (k-means) and four methods especially developed for time series [Short Time-series Expression Miner (STEM), Linear Mixed Model mixtures, Dynamic Time Warping for -Omics and linear modeling with R/Bioconductor limma package]. The methods were evaluated using data available from toxicological studies that had the aim to relate gene expression with phenotypic end-points (i.e. to develop biomarkers for adverse outcomes). Additionally, technical aspects (influence of noise, number of time points and number of replicates) were evaluated on simulated data. Results: None of the methods outperforms the others in terms of biology. Linear modeling with limma is mostly influenced by noise. STEM is mostly influenced by the number of biological replicates in the dataset, whereas k-means and linear modeling with limma are mostly influenced by the number of time points. In most cases, the results of the methods complement each other. We therefore provide recommendations to integrate the five methods.

AB - Motivation: Comparing time courses of gene expression with time courses of phenotypic data may provide new insights in cellular mechanisms. In this study, we compared the performance of five pattern recognition methods with respect to their ability to relate genes and phenotypic data: one classical method (k-means) and four methods especially developed for time series [Short Time-series Expression Miner (STEM), Linear Mixed Model mixtures, Dynamic Time Warping for -Omics and linear modeling with R/Bioconductor limma package]. The methods were evaluated using data available from toxicological studies that had the aim to relate gene expression with phenotypic end-points (i.e. to develop biomarkers for adverse outcomes). Additionally, technical aspects (influence of noise, number of time points and number of replicates) were evaluated on simulated data. Results: None of the methods outperforms the others in terms of biology. Linear modeling with limma is mostly influenced by noise. STEM is mostly influenced by the number of biological replicates in the dataset, whereas k-means and linear modeling with limma are mostly influenced by the number of time points. In most cases, the results of the methods complement each other. We therefore provide recommendations to integrate the five methods.

U2 - 10.1093/bioinformatics/btv108

DO - 10.1093/bioinformatics/btv108

M3 - Article

C2 - 25701576

SN - 1367-4803

VL - 31

SP - 2115

EP - 2122

JO - Bioinformatics

JF - Bioinformatics

IS - 13

ER -