Text-based experiment retrieval in genomic databases

Duygu Dede Sener*, Hasan Ogul, Selen Basak

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

With the growing number of genomic data in public repositories, efficient search methodologies have become a basic need to reach the relevant genomic data. However, this need cannot be fulfilled with the current repositories because they offer a limited search option which is a lexical matching of textual descriptions or metadata of the experiments. This technique is insufficient to get the required information needed to detect similarities between experiments within a large data collection. Due to the limitation of the existing repositories, in this study, we develop a text-based experiment retrieval framework by using both lexical and semantic similarity approaches to find similarities between experiments, and their retrieval performance was compared. This study is the first attempt to use text-driven semantic analysis approaches for developing a retrieval framework for experiments. An empirical study was conducted on a large textual description of Arabidopsis microarray experiments from the Gene Expression Omnibus database. In the proposed model, Jaccard similarity was used as a lexical similarity approach; Latent Semantic Analysis, Probabilistic Latent Semantic Analysis and Latent Dirichlet allocation were used as semantic similarity approaches to detect similarities between the textual descriptions of the experiments. According to the experimental results, relevant experiments can be retrieved successfully by text-driven semantic similarity approaches compared with the lexical similarity approach.

Original languageEnglish
Number of pages11
JournalJournal of Information Science
DOIs
Publication statusE-pub ahead of print - 3 Sept 2022

Keywords

  • Information retrieval
  • lexical similarity
  • microarray experiments
  • semantic similarity
  • text-based retrieval
  • ENRICHMENT ANALYSIS
  • ARABIDOPSIS
  • EXPLORATION

Cite this