Metadata standards for the FAIR sharing of vector embeddings in Biomedicine

Senay Kafkas, Remzi Celebi, Mehdi Ali, Hajira Jabeen, Michel Dumontier, Robert Hoehndorf

Research output: Contribution to conferencePaperAcademic

Abstract

Motivation
Today, we have an enormous amount of biomedical data and its size, as well as complexity, have been increasing over time. Implementation of standards represents one of the key drivers in the life sciences research as well as the technology transfer. More specifically, standards enable data accessibility, sharing, integration and therefore facilitates data harnessing and accelerates research and innovation transfer.
The life sciences community has widely developed and used Semantic web technology standards for data representation and sharing. However, given the success of unsupervised machine learning methods such as Word2Vec and BERT, there is a need to develop new standards for sharing the (pre-trained) vector space embeddings of the entities to facilitate reusability of data and method development. Motivated by this, we propose data and metadata standards for the FAIR distribution of vector embeddings and demonstrate utilization of these standards in Bio2Vec, a platform providing a flexible, reliable and standard-compliant data representation, sharing, integration and analysis.

Availability:
The proposed metadata standard and an example are available in the ShEx format at Zenodo.
Original languageEnglish
Publication statusPublished - 31 Oct 2020
Event28th Conference on Intelligent Systems for Molecular Biology: Bio-Ontologies COSI - Online, ISCB, Leesburg, United States
Duration: 13 Jul 202016 Jul 2020
Conference number: 28
https://www.iscb.org/ismb2020

Conference

Conference28th Conference on Intelligent Systems for Molecular Biology
Abbreviated titleISMB 2020
Country/TerritoryUnited States
CityLeesburg
Period13/07/2016/07/20
Internet address

Cite this