An evaluation of a novel approach for clustering genes with dissimilar replicates

Ozan Cinar; Cem Iyigun; Ozlem Ilk

doi:10.1080/03610918.2020.1839092

An evaluation of a novel approach for clustering genes with dissimilar replicates

Ozan Cinar^*, Cem Iyigun, Ozlem Ilk

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

Clustering the genes is a step in microarray studies which demands several considerations. First, the expression levels can be collected as time-series which should be accounted for appropriately. Furthermore, genes may behave differently in different biological replicates due to their genetic backgrounds. Highlighting such genes may deepen the study; however, it introduces further complexities for clustering. The third concern stems from the existence of a large amount of constant genes which demands a heavy computational burden. Finally, the number of clusters is not known in advance; therefore, a clustering algorithm should be able to recommend meaningful number of clusters. In this study, we evaluate a recently proposed clustering algorithm that promises to address these issues with a simulation study. The methodology accepts each gene as a combination of its replications and accounts for the time dependency. Furthermore, it computes cluster validation scores to suggest possible numbers of clusters. Results show that the methodology is able to find the clusters and highlight the genes with differences among the replications, separate the constant genes to reduce the computational burden, and suggest meaningful number of clusters. Furthermore, our results show that traditional distance metrics are not efficient in clustering the short time-series correctly.

Original language	English
Pages (from-to)	7458-7471
Number of pages	14
Journal	Communications in Statistics-Simulation and Computation
Volume	51
Issue number	12
Early online date	5 Dec 2020
DOIs	https://doi.org/10.1080/03610918.2020.1839092
Publication status	Published - 1 Dec 2022

Keywords

Clustering
cluster validation
microarray gene expression
replication
short time-series

Access to Document

10.1080/03610918.2020.1839092Licence: CC BY-NC-ND

Cite this

@article{c7c5f772a00b452d8589912731dfccf5,

title = "An evaluation of a novel approach for clustering genes with dissimilar replicates",

abstract = "Clustering the genes is a step in microarray studies which demands several considerations. First, the expression levels can be collected as time-series which should be accounted for appropriately. Furthermore, genes may behave differently in different biological replicates due to their genetic backgrounds. Highlighting such genes may deepen the study; however, it introduces further complexities for clustering. The third concern stems from the existence of a large amount of constant genes which demands a heavy computational burden. Finally, the number of clusters is not known in advance; therefore, a clustering algorithm should be able to recommend meaningful number of clusters. In this study, we evaluate a recently proposed clustering algorithm that promises to address these issues with a simulation study. The methodology accepts each gene as a combination of its replications and accounts for the time dependency. Furthermore, it computes cluster validation scores to suggest possible numbers of clusters. Results show that the methodology is able to find the clusters and highlight the genes with differences among the replications, separate the constant genes to reduce the computational burden, and suggest meaningful number of clusters. Furthermore, our results show that traditional distance metrics are not efficient in clustering the short time-series correctly.",

keywords = "Clustering, cluster validation, microarray gene expression, replication, short time-series",

author = "Ozan Cinar and Cem Iyigun and Ozlem Ilk",

note = "Publisher Copyright: {\textcopyright} 2020 The Author(s). Published with license by Taylor and Francis Group, LLC.",

year = "2022",

month = dec,

day = "1",

doi = "10.1080/03610918.2020.1839092",

language = "English",

volume = "51",

pages = "7458--7471",

journal = "Communications in Statistics-Simulation and Computation",

issn = "0361-0918",

publisher = "Routledge/Taylor & Francis Group",

number = "12",

}

TY - JOUR

T1 - An evaluation of a novel approach for clustering genes with dissimilar replicates

AU - Cinar, Ozan

AU - Iyigun, Cem

AU - Ilk, Ozlem

PY - 2022/12/1

Y1 - 2022/12/1

N2 - Clustering the genes is a step in microarray studies which demands several considerations. First, the expression levels can be collected as time-series which should be accounted for appropriately. Furthermore, genes may behave differently in different biological replicates due to their genetic backgrounds. Highlighting such genes may deepen the study; however, it introduces further complexities for clustering. The third concern stems from the existence of a large amount of constant genes which demands a heavy computational burden. Finally, the number of clusters is not known in advance; therefore, a clustering algorithm should be able to recommend meaningful number of clusters. In this study, we evaluate a recently proposed clustering algorithm that promises to address these issues with a simulation study. The methodology accepts each gene as a combination of its replications and accounts for the time dependency. Furthermore, it computes cluster validation scores to suggest possible numbers of clusters. Results show that the methodology is able to find the clusters and highlight the genes with differences among the replications, separate the constant genes to reduce the computational burden, and suggest meaningful number of clusters. Furthermore, our results show that traditional distance metrics are not efficient in clustering the short time-series correctly.

AB - Clustering the genes is a step in microarray studies which demands several considerations. First, the expression levels can be collected as time-series which should be accounted for appropriately. Furthermore, genes may behave differently in different biological replicates due to their genetic backgrounds. Highlighting such genes may deepen the study; however, it introduces further complexities for clustering. The third concern stems from the existence of a large amount of constant genes which demands a heavy computational burden. Finally, the number of clusters is not known in advance; therefore, a clustering algorithm should be able to recommend meaningful number of clusters. In this study, we evaluate a recently proposed clustering algorithm that promises to address these issues with a simulation study. The methodology accepts each gene as a combination of its replications and accounts for the time dependency. Furthermore, it computes cluster validation scores to suggest possible numbers of clusters. Results show that the methodology is able to find the clusters and highlight the genes with differences among the replications, separate the constant genes to reduce the computational burden, and suggest meaningful number of clusters. Furthermore, our results show that traditional distance metrics are not efficient in clustering the short time-series correctly.

KW - Clustering

KW - cluster validation

KW - microarray gene expression

KW - replication

KW - short time-series

U2 - 10.1080/03610918.2020.1839092

DO - 10.1080/03610918.2020.1839092

M3 - Article

SN - 0361-0918

VL - 51

SP - 7458

EP - 7471

JO - Communications in Statistics-Simulation and Computation

JF - Communications in Statistics-Simulation and Computation

IS - 12

ER -