Benefits of a clinical data warehouse with data mining tools to collect data for a radiotherapy trial

Erik Roelofs*, Lucas Persoon, Sebastiaan Nijsten, Wolfgang Wiessler, Andre Dekker, Philippe Lambin

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

42 Citations (Web of Science)
50 Downloads (Pure)


Introduction: Collecting trial data in a medical environment is at present mostly performed manually and therefore time-consuming, prone to errors and often incomplete with the complex data considered. Faster and more accurate methods are needed to improve the data quality and to shorten data collection times where information is often scattered over multiple data sources. The purpose of this study is to investigate the possible benefit of modern data warehouse technology in the radiation oncology field. Material and methods: In this study, a Computer Aided Theragnostics (CAT) data warehouse combined with automated tools for feature extraction was benchmarked against the regular manual data-collection processes. Two sets of clinical parameters were compiled for non-small cell lung cancer (NSCLC) and rectal cancer, using 27 patients per disease. Data collection times and inconsistencies were compared between the manual and the automated extraction method. Results: The average time per case to collect the NSCLC data manually was 10.4 +/- 2.1 min and 4.3 +/- 1.1 min when using the automated method (p <0.001). For rectal cancer, these times were 13.5 +/- 4.1 and 6.8 +/- 2.4 min, respectively (p <0.001). In 3.2% of the data collected for NSCLC and 5.3% for rectal cancer, there was a discrepancy between the manual and automated method. Conclusions: Aggregating multiple data sources in a data warehouse combined with tools for extraction of relevant parameters is beneficial for data collection times and offers the ability to improve data quality. The initial investments in digitizing the data are expected to be compensated due to the flexibility of the data analysis. Furthermore, successive investigations can easily select trial candidates and extract new parameters from the existing databases.
Original languageEnglish
Pages (from-to)174-179
JournalRadiotherapy and Oncology
Issue number1
Publication statusPublished - Jul 2013


  • Data warehouse
  • Clinical trials
  • Data quality
  • Efficiency

Cite this