A linked data representation for summary statistics and grouping criteria

James P. McCusker, Michel Dumontier, Shruthi Chari, Joanne S. Luciano, Deborah L. McGuinness

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingAcademicpeer-review

12 Downloads (Pure)

Abstract

Summary statistics are fundamental to data science, and are the buidling blocks of statistical reasoning. Most of the data and statistics made available on government web sites are aggregate, however, until now, we have not had a suitable linked data representation available. We propose a way to express summary statistics across aggregate groups as linked data using Web Ontology Language (OWL) Class based sets, where members of the set contribute to the overall aggregate value. Additionally, many clinical studies in the biomedical field rely on demographic summaries of their study cohorts and the patients assigned to each arm. While most data query languages, including SPARQL, allow for computation of summary statistics, they do not provide a way to integrate those values back into the RDF graphs they were computed from. We represent this knowledge, that would otherwise be lost, through the use of OWL 2 punning semantics, the expression of aggregate grouping criteria as OWL classes with variables, and constructs from the Semanticscience Integrated Ontology (SIO), and the World Wide Web Consortium’s provenance ontology, PROV-O, providing interoperable representations that are well supported across the web of Linked Data. We evaluate these semantics using a Resource Description Framework (RDF) representation of patient case information from the Genomic Data Commons, a data portal from the National Cancer Institute.
Original languageEnglish
Title of host publicationComputer Vision Winter Workshop 2023
Volume2549
Publication statusPublished - 1 Jan 2019
Event2019 Joint International Workshops on Sensors and Actuators on the Web, and Semantic Statistics - Auckland, New Zealand
Duration: 27 Oct 201927 Oct 2019

Publication series

SeriesCEUR Workshop Proceedings
ISSN1613-0073

Conference

Conference2019 Joint International Workshops on Sensors and Actuators on the Web, and Semantic Statistics
Abbreviated titleSAWSemStats 2019
Country/TerritoryNew Zealand
CityAuckland
Period27/10/1927/10/19

Keywords

  • Data Exploration
  • Data Science
  • Interoperability
  • Knowledge Representation
  • Linked Data
  • Provenance
  • Summary Statistics
  • Transparency

Fingerprint

Dive into the research topics of 'A linked data representation for summary statistics and grouping criteria'. Together they form a unique fingerprint.

Cite this