The health care and life sciences community profile for dataset descriptions

Michel Dumontier; Alasdair J. G. Gray; M Scott Marshall; Vladimir Alexiev; Peter Ansell; Gary Bader; Joachim Baran; Jerven T. Bolleman; Alison Callahan; Jose Cruz-Toledo; Pascale Gaudet; Erich A. Gombocz; Alejandra N. Gonzalez-Beltran; Paul Groth; Melissa Haendel; Maori Ito; Simon Jupp; Nick Juty; Toshiaki Katayama; Norio Kobayashi; Kalpana Krishnaswami; Camille Laibe; Nicolas Le Novere; Simon Lin; James Malone; Michael Miller; Christopher J. Mungall; Laurens Rietveld; Sarala M. Wimalaratne; Atsuko Yamaguchi

doi:10.7717/peerj.2331

The health care and life sciences community profile for dataset descriptions

Michel Dumontier^*, Alasdair J. G. Gray^*, M Scott Marshall^*, Vladimir Alexiev, Peter Ansell, Gary Bader, Joachim Baran, Jerven T. Bolleman, Alison Callahan, Jose Cruz-Toledo, Pascale Gaudet, Erich A. Gombocz, Alejandra N. Gonzalez-Beltran, Paul Groth, Melissa Haendel, Maori Ito, Simon Jupp, Nick Juty, Toshiaki Katayama, Norio KobayashiKalpana Krishnaswami, Camille Laibe, Nicolas Le Novere, Simon Lin, James Malone, Michael Miller, Christopher J. Mungall, Laurens Rietveld, Sarala M. Wimalaratne, Atsuko Yamaguchi

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.

Original language	English
Article number	2331
Journal	PEERJ
Volume	4
DOIs	https://doi.org/10.7717/peerj.2331
Publication status	Published - 16 Aug 2016

Keywords

Data profiling
Dataset descriptions
Metadata
Provenance
FAIR data

Access to Document

10.7717/peerj.2331Licence: CC BY

Cite this

Dumontier, M., Gray, A. J. G., Marshall, M. S., Alexiev, V., Ansell, P., Bader, G., Baran, J., Bolleman, J. T., Callahan, A., Cruz-Toledo, J., Gaudet, P., Gombocz, E. A., Gonzalez-Beltran, A. N., Groth, P., Haendel, M., Ito, M., Jupp, S., Juty, N., Katayama, T., ... Yamaguchi, A. (2016). The health care and life sciences community profile for dataset descriptions. PEERJ, 4, Article 2331. https://doi.org/10.7717/peerj.2331

@article{e94fd67a8eff4ca2b943ca71a0b6a1bd,

title = "The health care and life sciences community profile for dataset descriptions",

abstract = "Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.",

keywords = "Data profiling, Dataset descriptions, Metadata, Provenance, FAIR data",

author = "Michel Dumontier and Gray, {Alasdair J. G.} and Marshall, {M Scott} and Vladimir Alexiev and Peter Ansell and Gary Bader and Joachim Baran and Bolleman, {Jerven T.} and Alison Callahan and Jose Cruz-Toledo and Pascale Gaudet and Gombocz, {Erich A.} and Gonzalez-Beltran, {Alejandra N.} and Paul Groth and Melissa Haendel and Maori Ito and Simon Jupp and Nick Juty and Toshiaki Katayama and Norio Kobayashi and Kalpana Krishnaswami and Camille Laibe and {Le Novere}, Nicolas and Simon Lin and James Malone and Michael Miller and Mungall, {Christopher J.} and Laurens Rietveld and Wimalaratne, {Sarala M.} and Atsuko Yamaguchi",

year = "2016",

month = aug,

day = "16",

doi = "10.7717/peerj.2331",

language = "English",

volume = "4",

journal = "PEERJ",

issn = "2167-8359",

publisher = "PeerJ Inc.",

}

Dumontier, M, Gray, AJG, Marshall, MS, Alexiev, V, Ansell, P, Bader, G, Baran, J, Bolleman, JT, Callahan, A, Cruz-Toledo, J, Gaudet, P, Gombocz, EA, Gonzalez-Beltran, AN, Groth, P, Haendel, M, Ito, M, Jupp, S, Juty, N, Katayama, T, Kobayashi, N, Krishnaswami, K, Laibe, C, Le Novere, N, Lin, S, Malone, J, Miller, M, Mungall, CJ, Rietveld, L, Wimalaratne, SM & Yamaguchi, A 2016, 'The health care and life sciences community profile for dataset descriptions', PEERJ, vol. 4, 2331. https://doi.org/10.7717/peerj.2331

TY - JOUR

T1 - The health care and life sciences community profile for dataset descriptions

AU - Dumontier, Michel

AU - Gray, Alasdair J. G.

AU - Marshall, M Scott

AU - Alexiev, Vladimir

AU - Ansell, Peter

AU - Bader, Gary

AU - Baran, Joachim

AU - Bolleman, Jerven T.

AU - Callahan, Alison

AU - Cruz-Toledo, Jose

AU - Gaudet, Pascale

AU - Gombocz, Erich A.

AU - Gonzalez-Beltran, Alejandra N.

AU - Groth, Paul

AU - Haendel, Melissa

AU - Ito, Maori

AU - Jupp, Simon

AU - Juty, Nick

AU - Katayama, Toshiaki

AU - Kobayashi, Norio

AU - Krishnaswami, Kalpana

AU - Laibe, Camille

AU - Le Novere, Nicolas

AU - Lin, Simon

AU - Malone, James

AU - Miller, Michael

AU - Mungall, Christopher J.

AU - Rietveld, Laurens

AU - Wimalaratne, Sarala M.

AU - Yamaguchi, Atsuko

PY - 2016/8/16

Y1 - 2016/8/16

N2 - Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.

AB - Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.

KW - Data profiling

KW - Dataset descriptions

KW - Metadata

KW - Provenance

KW - FAIR data

U2 - 10.7717/peerj.2331

DO - 10.7717/peerj.2331

M3 - Article

C2 - 27602295

SN - 2167-8359

VL - 4

JO - PEERJ

JF - PEERJ

M1 - 2331

ER -