A resource to explore the discovery of rare diseases and their causative genes

F. Ehrhart; E.L. Willighagen; M. Kutmon; M. van Hoften; L.M.G. Curfs; C.T. Evelo

doi:10.1038/s41597-021-00905-y

A resource to explore the discovery of rare diseases and their causative genes

F. Ehrhart^*, E.L. Willighagen, M. Kutmon, M. van Hoften, L.M.G. Curfs, C.T. Evelo

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

Here, we describe a dataset with information about monogenic, rare diseases with a known genetic background, supplemented with manually extracted provenance for the disease itself and the discovery of the underlying genetic cause. We assembled a collection of 4166 rare monogenic diseases and linked them to 3163 causative genes, annotated with OMIM and Ensembl identifiers and HGNC symbols. The PubMed identifiers of the scientific publications, which for the first time described the rare diseases, and the publications, which found the genes causing the diseases were added using information from OMIM, PubMed, Wikipedia, whonamedit.com, and Google Scholar. The data are available under CC0 license as spreadsheet and as RDF in a semantic model modified from DisGeNET, and was added to Wikidata. This dataset relies on publicly available data and publications with a PubMed identifier, but by our effort to make the data interoperable and linked, we can now analyse this data. Our analysis revealed the timeline of rare disease and causative gene discovery and links them to developments in methods.

Original language	English
Article number	124
Number of pages	8
Journal	Scientific data
Volume	8
Issue number	1
DOIs	https://doi.org/10.1038/s41597-021-00905-y
Publication status	Published - 4 May 2021

Keywords

IMPERFECTA TYPE-IV
DISORDER

Access to Document

10.1038/s41597-021-00905-yLicence: CC BY

Cite this

@article{38ee64422b534921906759b56d8642ab,

title = "A resource to explore the discovery of rare diseases and their causative genes",

abstract = "Here, we describe a dataset with information about monogenic, rare diseases with a known genetic background, supplemented with manually extracted provenance for the disease itself and the discovery of the underlying genetic cause. We assembled a collection of 4166 rare monogenic diseases and linked them to 3163 causative genes, annotated with OMIM and Ensembl identifiers and HGNC symbols. The PubMed identifiers of the scientific publications, which for the first time described the rare diseases, and the publications, which found the genes causing the diseases were added using information from OMIM, PubMed, Wikipedia, whonamedit.com, and Google Scholar. The data are available under CC0 license as spreadsheet and as RDF in a semantic model modified from DisGeNET, and was added to Wikidata. This dataset relies on publicly available data and publications with a PubMed identifier, but by our effort to make the data interoperable and linked, we can now analyse this data. Our analysis revealed the timeline of rare disease and causative gene discovery and links them to developments in methods.",

keywords = "IMPERFECTA TYPE-IV, DISORDER",

author = "F. Ehrhart and E.L. Willighagen and M. Kutmon and {van Hoften}, M. and L.M.G. Curfs and C.T. Evelo",

year = "2021",

month = may,

day = "4",

doi = "10.1038/s41597-021-00905-y",

language = "English",

volume = "8",

journal = "Scientific data",

issn = "2052-4463",

publisher = "Nature Publishing Group",

number = "1",

}

TY - JOUR

T1 - A resource to explore the discovery of rare diseases and their causative genes

AU - Ehrhart, F.

AU - Willighagen, E.L.

AU - Kutmon, M.

AU - van Hoften, M.

AU - Curfs, L.M.G.

AU - Evelo, C.T.

PY - 2021/5/4

Y1 - 2021/5/4

N2 - Here, we describe a dataset with information about monogenic, rare diseases with a known genetic background, supplemented with manually extracted provenance for the disease itself and the discovery of the underlying genetic cause. We assembled a collection of 4166 rare monogenic diseases and linked them to 3163 causative genes, annotated with OMIM and Ensembl identifiers and HGNC symbols. The PubMed identifiers of the scientific publications, which for the first time described the rare diseases, and the publications, which found the genes causing the diseases were added using information from OMIM, PubMed, Wikipedia, whonamedit.com, and Google Scholar. The data are available under CC0 license as spreadsheet and as RDF in a semantic model modified from DisGeNET, and was added to Wikidata. This dataset relies on publicly available data and publications with a PubMed identifier, but by our effort to make the data interoperable and linked, we can now analyse this data. Our analysis revealed the timeline of rare disease and causative gene discovery and links them to developments in methods.

AB - Here, we describe a dataset with information about monogenic, rare diseases with a known genetic background, supplemented with manually extracted provenance for the disease itself and the discovery of the underlying genetic cause. We assembled a collection of 4166 rare monogenic diseases and linked them to 3163 causative genes, annotated with OMIM and Ensembl identifiers and HGNC symbols. The PubMed identifiers of the scientific publications, which for the first time described the rare diseases, and the publications, which found the genes causing the diseases were added using information from OMIM, PubMed, Wikipedia, whonamedit.com, and Google Scholar. The data are available under CC0 license as spreadsheet and as RDF in a semantic model modified from DisGeNET, and was added to Wikidata. This dataset relies on publicly available data and publications with a PubMed identifier, but by our effort to make the data interoperable and linked, we can now analyse this data. Our analysis revealed the timeline of rare disease and causative gene discovery and links them to developments in methods.

KW - IMPERFECTA TYPE-IV

KW - DISORDER

U2 - 10.1038/s41597-021-00905-y

DO - 10.1038/s41597-021-00905-y

M3 - Article

C2 - 33947870

SN - 2052-4463

VL - 8

JO - Scientific data

JF - Scientific data

IS - 1

M1 - 124

ER -