Abstract
Motivation
Today, we have an enormous amount of biomedical data and its size, as well as complexity, have been increasing over time. Implementation of standards represents one of the key drivers in the life sciences research as well as the technology transfer. More specifically, standards enable data accessibility, sharing, integration and therefore facilitates data harnessing and accelerates research and innovation transfer.
The life sciences community has widely developed and used Semantic web technology standards for data representation and sharing. However, given the success of unsupervised machine learning methods such as Word2Vec and BERT, there is a need to develop new standards for sharing the (pre-trained) vector space embeddings of the entities to facilitate reusability of data and method development. Motivated by this, we propose data and metadata standards for the FAIR distribution of vector embeddings and demonstrate utilization of these standards in Bio2Vec, a platform providing a flexible, reliable and standard-compliant data representation, sharing, integration and analysis.
Availability:
The proposed metadata standard and an example are available in the ShEx format at Zenodo.
Today, we have an enormous amount of biomedical data and its size, as well as complexity, have been increasing over time. Implementation of standards represents one of the key drivers in the life sciences research as well as the technology transfer. More specifically, standards enable data accessibility, sharing, integration and therefore facilitates data harnessing and accelerates research and innovation transfer.
The life sciences community has widely developed and used Semantic web technology standards for data representation and sharing. However, given the success of unsupervised machine learning methods such as Word2Vec and BERT, there is a need to develop new standards for sharing the (pre-trained) vector space embeddings of the entities to facilitate reusability of data and method development. Motivated by this, we propose data and metadata standards for the FAIR distribution of vector embeddings and demonstrate utilization of these standards in Bio2Vec, a platform providing a flexible, reliable and standard-compliant data representation, sharing, integration and analysis.
Availability:
The proposed metadata standard and an example are available in the ShEx format at Zenodo.
Original language | English |
---|---|
Publication status | Published - 31 Oct 2020 |
Event | 28th Conference on Intelligent Systems for Molecular Biology: Bio-Ontologies COSI - Online, ISCB, Leesburg, United States Duration: 13 Jul 2020 → 16 Jul 2020 Conference number: 28 https://www.iscb.org/ismb2020 |
Conference
Conference | 28th Conference on Intelligent Systems for Molecular Biology |
---|---|
Abbreviated title | ISMB 2020 |
Country/Territory | United States |
City | Leesburg |
Period | 13/07/20 → 16/07/20 |
Internet address |