Assessing Data Quality in Heterogeneous Health Care Integration: Simulation Study of the AIDAVA Framework

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Background: Integrated health data are foundational for secondary use, research, and policymaking. However, data quality issues-such as missing values and inconsistencies-are common due to the heterogeneity of health data sources. Existing frameworks often use static, 1-time assessments, which limit their ability to address quality issues across evolving data pipelines. Objective: This study evaluates the AIDAVA (artificial intelligence-powered data curation and validation) data quality framework, which introduces dynamic, life cycle-based validation of health data using knowledge graph technologies and SHACL (Shapes Constraint Language)-based rules. The framework is assessed for its ability to detect and manage data quality issues-specifically, completeness and consistency-during integration. Methods: Using the MIMIC-III (Medical Information Mart for Intensive Care-III) dataset, we simulated real-world data quality challenges by introducing structured noise, including missing values and logical inconsistencies. The data was transformed into source knowledge graphs and integrated into a unified personal health knowledge graph. SHACL validation rules were applied iteratively during the integration process, and data quality was assessed under varying noise levels and integration orders. Results: The AIDAVA framework effectively detected completeness and consistency issues across all scenarios. Completeness was shown to influence the interpretability of consistency scores, and domain-specific attributes (eg, diagnoses and procedures) Conclusions: AIDAVA supports dynamic, rule-based validation throughout the data life cycle. By addressing both dimension-specific vulnerabilities and cross-dimensional effects, it lays the groundwork for scalable, high-quality health data integration. Future work should explore deployment in live clinical settings and expand to additional quality dimensions.
Original languageEnglish
Article numbere75275
Number of pages15
JournalJMIR Medical Informatics
Volume13
DOIs
Publication statusPublished - 2025

Keywords

  • data quality
  • knowledge graph
  • ontology
  • health data
  • data quality dimensions
  • data quality assessment
  • secondary use
  • data quality framework
  • fit for purpose
  • CLINICAL-RESEARCH
  • RECORDS

Fingerprint

Dive into the research topics of 'Assessing Data Quality in Heterogeneous Health Care Integration: Simulation Study of the AIDAVA Framework'. Together they form a unique fingerprint.

Cite this