A linked open data representation of patents registered in the US from 2005-2017

Mofeed M Hassan, Amrapali Zaveri, Jens Lehmann

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Patents are widely used to protect intellectual property and a measure of innovation output. Each year, the USPTO grants over 150,000 patents to individuals and companies all over the world. In fact, there were more than 280,000 patent grants issued in the US in 2015. However, accessing, searching and analyzing those patents is often still cumbersome and inefficient. To overcome those problems, Google indexes patents and converts them to Extensible Markup Language (XML) files using Optical Character Recognition (OCR) techniques. In this article, we take this idea one step further and provide semantically rich, machine-readable patents using the Linked Data principles. We have converted the data spanning 12 years - i.e. 2005-2017 from XML to Resource Description Framework (RDF) format, conforming to the Linked Data principles and made them publicly available for re-use. This data can be integrated with other data sources in order to further simplify use cases such as trend analysis, structured patent search & exploration and societal progress measurements. We describe the conversion, publishing, interlinking process along with several use cases for the USPTO Linked Patent data.

Original languageEnglish
Article number180279
Number of pages9
JournalScientific data
Volume5
DOIs
Publication statusPublished - 4 Dec 2018

Cite this

@article{82ff4bdc825f44e0af834ff6ac77de32,
title = "A linked open data representation of patents registered in the US from 2005-2017",
abstract = "Patents are widely used to protect intellectual property and a measure of innovation output. Each year, the USPTO grants over 150,000 patents to individuals and companies all over the world. In fact, there were more than 280,000 patent grants issued in the US in 2015. However, accessing, searching and analyzing those patents is often still cumbersome and inefficient. To overcome those problems, Google indexes patents and converts them to Extensible Markup Language (XML) files using Optical Character Recognition (OCR) techniques. In this article, we take this idea one step further and provide semantically rich, machine-readable patents using the Linked Data principles. We have converted the data spanning 12 years - i.e. 2005-2017 from XML to Resource Description Framework (RDF) format, conforming to the Linked Data principles and made them publicly available for re-use. This data can be integrated with other data sources in order to further simplify use cases such as trend analysis, structured patent search & exploration and societal progress measurements. We describe the conversion, publishing, interlinking process along with several use cases for the USPTO Linked Patent data.",
author = "Hassan, {Mofeed M} and Amrapali Zaveri and Jens Lehmann",
year = "2018",
month = "12",
day = "4",
doi = "10.1038/sdata.2018.279",
language = "English",
volume = "5",
journal = "Scientific data",
issn = "2052-4463",
publisher = "Nature Publishing Group",

}

A linked open data representation of patents registered in the US from 2005-2017. / Hassan, Mofeed M; Zaveri, Amrapali; Lehmann, Jens.

In: Scientific data, Vol. 5, 180279, 04.12.2018.

Research output: Contribution to journalArticleAcademicpeer-review

TY - JOUR

T1 - A linked open data representation of patents registered in the US from 2005-2017

AU - Hassan, Mofeed M

AU - Zaveri, Amrapali

AU - Lehmann, Jens

PY - 2018/12/4

Y1 - 2018/12/4

N2 - Patents are widely used to protect intellectual property and a measure of innovation output. Each year, the USPTO grants over 150,000 patents to individuals and companies all over the world. In fact, there were more than 280,000 patent grants issued in the US in 2015. However, accessing, searching and analyzing those patents is often still cumbersome and inefficient. To overcome those problems, Google indexes patents and converts them to Extensible Markup Language (XML) files using Optical Character Recognition (OCR) techniques. In this article, we take this idea one step further and provide semantically rich, machine-readable patents using the Linked Data principles. We have converted the data spanning 12 years - i.e. 2005-2017 from XML to Resource Description Framework (RDF) format, conforming to the Linked Data principles and made them publicly available for re-use. This data can be integrated with other data sources in order to further simplify use cases such as trend analysis, structured patent search & exploration and societal progress measurements. We describe the conversion, publishing, interlinking process along with several use cases for the USPTO Linked Patent data.

AB - Patents are widely used to protect intellectual property and a measure of innovation output. Each year, the USPTO grants over 150,000 patents to individuals and companies all over the world. In fact, there were more than 280,000 patent grants issued in the US in 2015. However, accessing, searching and analyzing those patents is often still cumbersome and inefficient. To overcome those problems, Google indexes patents and converts them to Extensible Markup Language (XML) files using Optical Character Recognition (OCR) techniques. In this article, we take this idea one step further and provide semantically rich, machine-readable patents using the Linked Data principles. We have converted the data spanning 12 years - i.e. 2005-2017 from XML to Resource Description Framework (RDF) format, conforming to the Linked Data principles and made them publicly available for re-use. This data can be integrated with other data sources in order to further simplify use cases such as trend analysis, structured patent search & exploration and societal progress measurements. We describe the conversion, publishing, interlinking process along with several use cases for the USPTO Linked Patent data.

U2 - 10.1038/sdata.2018.279

DO - 10.1038/sdata.2018.279

M3 - Article

C2 - 30512011

VL - 5

JO - Scientific data

JF - Scientific data

SN - 2052-4463

M1 - 180279

ER -