Machine learning model for classification of predominantly allergic and non-allergic asthma among preschool children with asthma hospitalization

Piyush Bhardwaj*, Ashish Tyagi, Shashank Tyagi, Joana Antão, Qichen Deng

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review


OBJECTIVE: Asthma is the most frequent chronic airway illness in preschool children and is difficult to diagnose due to the disease's heterogeneity. This study aimed to investigate different machine learning models and suggested the most effective one to classify two forms of asthma in preschool children (predominantly allergic asthma and non-allergic asthma) using a minimum number of features.

METHODS: After pre-processing, 127 patients (70 with non-allergic asthma and 57 with predominantly allergic asthma) were chosen for final analysis from the Frankfurt dataset, which had asthma-related information on 205 patients. The Random Forest algorithm and Chi-square were used to select the key features from a total of 63 features. Six machine learning models: random forest, extreme gradient boosting, support vector machines, adaptive boosting, extra tree classifier, and logistic regression were then trained and tested using 10-fold stratified cross-validation.

RESULTS: Among all features, age, weight, C-reactive protein, eosinophilic granulocytes, oxygen saturation, pre-medication inhaled corticosteroid + long-acting beta2-agonist (PM-ICS + LABA), PM-other (other pre-medication), H-Pulmicort/celestamine (Pulmicort/celestamine during hospitalization), and H-azithromycin (azithromycin during hospitalization) were found to be highly important. The support vector machine approach with a linear kernel was able to diffrentiate between predominantly allergic asthma and non-allergic asthma with higher accuracy (77.8%), precision (0.81), with a true positive rate of 0.73 and a true negative rate of 0.81, a F1 score of 0.81, and a ROC-AUC score of 0.79. Logistic regression was found to be the second-best classifier with an overall accuracy of 76.2%.

CONCLUSION: Predominantly allergic and non-allergic asthma can be classified using machine learning approaches based on nine features.

Original languageEnglish
Pages (from-to)487-495
Number of pages9
JournalJournal of Asthma
Issue number3
Early online date7 Apr 2022
Publication statusPublished - 4 Mar 2023


  • Frankfurt dataset
  • Machine learning
  • Non-allergic asthma
  • Predominantly allergic asthma


Dive into the research topics of 'Machine learning model for classification of predominantly allergic and non-allergic asthma among preschool children with asthma hospitalization'. Together they form a unique fingerprint.

Cite this