Abstract

Objectives: Data-driven decision support tools have been increasingly recognized to transform health care. However, such tools are often developed on predefined research datasets without adequate knowledge of the origin of this data and how it was selected. How a dataset is extracted from a clinical database can profoundly impact the validity, interpretability and interoperability of the dataset, and downstream analyses, yet is rarely reported. Therefore, we present a case study illustrating how a definitive patient list was extracted from a clinical source database and how this can be reported. Study Design and Setting: A single-center observational study was performed at an academic hospital in the Netherlands to illustrate the impact of selecting a definitive patient list for research from a clinical source database, and the importance of documenting this process. All admissions from the critical care database admitted between January 1, 2013, and January 1, 2023, were used. Results: An interdisciplinary team collaborated to identify and address potential sources of data insufficiency and uncertainty. We demonstrate a stepwise data preparation process, reducing the clinical source database of 54,218 admissions to a definitive patient list of 21,553 admissions. Transparent documentation of the data preparation process improves the quality of the definitive patient list before analysis of the corresponding patient data. This study generated seven important recommendations for preparing observational health-care data for research purposes. Conclusion: Documenting data preparation is essential for understanding a research dataset originating from a clinical source database before analyzing health-care data. The findings contribute to establishing data standards and offer insights into the complexities of preparing health-care data for scientific investigation. Meticulous data preparation and documentation thereof will improve research validity and advance critical care.

Original languageEnglish
Article number111342
JournalJournal of Clinical Epidemiology
Volume170
Early online date2 Apr 2024
DOIs
Publication statusE-pub ahead of print - 2 Apr 2024

Keywords

  • bias
  • clinical database
  • data cleaning
  • data preparation
  • table 0

Cite this