An evaluation of a novel approach for clustering genes with dissimilar replicates

Ozan Cinar*, Cem Iyigun, Ozlem Ilk

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Clustering the genes is a step in microarray studies which demands several considerations. First, the expression levels can be collected as time-series which should be accounted for appropriately. Furthermore, genes may behave differently in different biological replicates due to their genetic backgrounds. Highlighting such genes may deepen the study; however, it introduces further complexities for clustering. The third concern stems from the existence of a large amount of constant genes which demands a heavy computational burden. Finally, the number of clusters is not known in advance; therefore, a clustering algorithm should be able to recommend meaningful number of clusters. In this study, we evaluate a recently proposed clustering algorithm that promises to address these issues with a simulation study. The methodology accepts each gene as a combination of its replications and accounts for the time dependency. Furthermore, it computes cluster validation scores to suggest possible numbers of clusters. Results show that the methodology is able to find the clusters and highlight the genes with differences among the replications, separate the constant genes to reduce the computational burden, and suggest meaningful number of clusters. Furthermore, our results show that traditional distance metrics are not efficient in clustering the short time-series correctly.

Original languageEnglish
Pages (from-to)7458-7471
Number of pages14
JournalCommunications in Statistics-Simulation and Computation
Volume51
Issue number12
Early online date5 Dec 2020
DOIs
Publication statusPublished - 1 Dec 2022

Keywords

  • Clustering
  • cluster validation
  • microarray gene expression
  • replication
  • short time-series

Cite this