Implementation of quality controls is essential to prevent batch effects in breathomics data and allow for cross-study comparisons

Georgios Stavropoulos, Daisy M. A. E. Jonkers, Zlatan Mujagic, Ger H. Koek, Ad A. M. Masclee, Marieke J. Pierik, Jan W. Dallinga, Frederik-Jan Van Schooten, Agnieszka Smolinska*

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review


Exhaled breath analysis has become a promising monitoring tool for various ailments by identifying volatile organic compounds (VOCs) as indicative biomarkers excreted in the human body. Throughout the process of sampling, measuring, and data processing, non-biological variations are introduced in the data leading to batch effects. Algorithmic approaches have been developed to cope with within-study batch effects. Batch differences, however, may occur among different studies too, and up-to-date, ways to correct for cross-study batch effects are lacking; ultimately, cross-study comparisons to verify the uniqueness of found VOC profiles for a specific disease may be challenging. This study applies within-study batch-effect-correction approaches to correct for cross-study batch effects; suggestions are made that may help prevent the introduction of cross-study variations. Three batch-effect-correction algorithms were investigated: zero-centering, combat, and the analysis of covariance framework. The breath samples were collected from inflammatory bowel disease (n = 213), chronic liver disease (n = 189), and irritable bowel syndrome (n = 261) patients at different periods, and they were analysed via gas chromatography-mass spectrometry. Multivariate statistics were used to visualise and verify the results. The visualisation of the data before any batch-effect-correction technique was applied showed a clear distinction due to probable batch effects among the datasets of the three cohorts. The visualisation of the three datasets after implementing all three correction techniques showed that the batch effects were still present in the data. Predictions made using partial least squares discriminant analysis and random forest confirmed this observation. The within-study batch-effect-correction approaches fail to correct for cross-study batch effects present in the data. The present study proposes a framework for systematically standardising future breathomics data by using internal standards or quality control samples at regular analysis intervals. Further knowledge regarding the nature of the unsolicited variations among cross-study batches must be obtained to move the field further.

Original languageEnglish
Article number026012
Number of pages12
JournalJournal of Breath Research
Issue number2
Publication statusPublished - Apr 2020


  • exhaled breath
  • volatile organic compounds
  • VOCs
  • data analysis
  • batch effects
  • IBD
  • IBS
  • liver cirrhosis

Cite this