Machine learning model for classification of predominantly allergic and non-allergic asthma among preschool children with asthma hospitalization

Piyush Bhardwaj; Ashish Tyagi; Shashank Tyagi; Joana Antão; Qichen Deng

doi:10.1080/02770903.2022.2059763

Machine learning model for classification of predominantly allergic and non-allergic asthma among preschool children with asthma hospitalization

Piyush Bhardwaj^*, Ashish Tyagi, Shashank Tyagi, Joana Antão, Qichen Deng

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

OBJECTIVE: Asthma is the most frequent chronic airway illness in preschool children and is difficult to diagnose due to the disease's heterogeneity. This study aimed to investigate different machine learning models and suggested the most effective one to classify two forms of asthma in preschool children (predominantly allergic asthma and non-allergic asthma) using a minimum number of features.

METHODS: After pre-processing, 127 patients (70 with non-allergic asthma and 57 with predominantly allergic asthma) were chosen for final analysis from the Frankfurt dataset, which had asthma-related information on 205 patients. The Random Forest algorithm and Chi-square were used to select the key features from a total of 63 features. Six machine learning models: random forest, extreme gradient boosting, support vector machines, adaptive boosting, extra tree classifier, and logistic regression were then trained and tested using 10-fold stratified cross-validation.

RESULTS: Among all features, age, weight, C-reactive protein, eosinophilic granulocytes, oxygen saturation, pre-medication inhaled corticosteroid + long-acting beta2-agonist (PM-ICS + LABA), PM-other (other pre-medication), H-Pulmicort/celestamine (Pulmicort/celestamine during hospitalization), and H-azithromycin (azithromycin during hospitalization) were found to be highly important. The support vector machine approach with a linear kernel was able to diffrentiate between predominantly allergic asthma and non-allergic asthma with higher accuracy (77.8%), precision (0.81), with a true positive rate of 0.73 and a true negative rate of 0.81, a F1 score of 0.81, and a ROC-AUC score of 0.79. Logistic regression was found to be the second-best classifier with an overall accuracy of 76.2%.

CONCLUSION: Predominantly allergic and non-allergic asthma can be classified using machine learning approaches based on nine features.

Original language	English
Pages (from-to)	487-495
Number of pages	9
Journal	Journal of Asthma
Volume	60
Issue number	3
Early online date	7 Apr 2022
DOIs	https://doi.org/10.1080/02770903.2022.2059763
Publication status	Published - 4 Mar 2023

Keywords

Frankfurt dataset
Machine learning
Non-allergic asthma
Predominantly allergic asthma

Access to Document

10.1080/02770903.2022.2059763

Cite this

@article{0d66956a4c094f8daffccd180ce4bebb,

title = "Machine learning model for classification of predominantly allergic and non-allergic asthma among preschool children with asthma hospitalization",

abstract = "OBJECTIVE: Asthma is the most frequent chronic airway illness in preschool children and is difficult to diagnose due to the disease's heterogeneity. This study aimed to investigate different machine learning models and suggested the most effective one to classify two forms of asthma in preschool children (predominantly allergic asthma and non-allergic asthma) using a minimum number of features.METHODS: After pre-processing, 127 patients (70 with non-allergic asthma and 57 with predominantly allergic asthma) were chosen for final analysis from the Frankfurt dataset, which had asthma-related information on 205 patients. The Random Forest algorithm and Chi-square were used to select the key features from a total of 63 features. Six machine learning models: random forest, extreme gradient boosting, support vector machines, adaptive boosting, extra tree classifier, and logistic regression were then trained and tested using 10-fold stratified cross-validation.RESULTS: Among all features, age, weight, C-reactive protein, eosinophilic granulocytes, oxygen saturation, pre-medication inhaled corticosteroid + long-acting beta2-agonist (PM-ICS + LABA), PM-other (other pre-medication), H-Pulmicort/celestamine (Pulmicort/celestamine during hospitalization), and H-azithromycin (azithromycin during hospitalization) were found to be highly important. The support vector machine approach with a linear kernel was able to diffrentiate between predominantly allergic asthma and non-allergic asthma with higher accuracy (77.8%), precision (0.81), with a true positive rate of 0.73 and a true negative rate of 0.81, a F1 score of 0.81, and a ROC-AUC score of 0.79. Logistic regression was found to be the second-best classifier with an overall accuracy of 76.2%.CONCLUSION: Predominantly allergic and non-allergic asthma can be classified using machine learning approaches based on nine features.",

keywords = "Frankfurt dataset, Machine learning, Non-allergic asthma, Predominantly allergic asthma",

author = "Piyush Bhardwaj and Ashish Tyagi and Shashank Tyagi and Joana Ant{\~a}o and Qichen Deng",

note = "Funding Information: We thank Stefan Zielen, Sven Kluge, Helena Donath, Katherina Bl{\"u}mchen, Jordis Trischlera, and Johannes Schulze from Klinikum Goethe University (KGU) for providing the Frankfurt dataset for the present study. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. Publisher Copyright: {\textcopyright} 2022 Taylor & Francis Group, LLC.",

year = "2023",

month = mar,

day = "4",

doi = "10.1080/02770903.2022.2059763",

language = "English",

volume = "60",

pages = "487--495",

journal = "Journal of Asthma",

issn = "0277-0903",

publisher = "Informa Healthcare",

number = "3",

}

TY - JOUR

T1 - Machine learning model for classification of predominantly allergic and non-allergic asthma among preschool children with asthma hospitalization

AU - Bhardwaj, Piyush

AU - Tyagi, Ashish

AU - Tyagi, Shashank

AU - Antão, Joana

AU - Deng, Qichen

N1 - Funding Information: We thank Stefan Zielen, Sven Kluge, Helena Donath, Katherina Blümchen, Jordis Trischlera, and Johannes Schulze from Klinikum Goethe University (KGU) for providing the Frankfurt dataset for the present study. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. Publisher Copyright: © 2022 Taylor & Francis Group, LLC.

PY - 2023/3/4

Y1 - 2023/3/4

N2 - OBJECTIVE: Asthma is the most frequent chronic airway illness in preschool children and is difficult to diagnose due to the disease's heterogeneity. This study aimed to investigate different machine learning models and suggested the most effective one to classify two forms of asthma in preschool children (predominantly allergic asthma and non-allergic asthma) using a minimum number of features.METHODS: After pre-processing, 127 patients (70 with non-allergic asthma and 57 with predominantly allergic asthma) were chosen for final analysis from the Frankfurt dataset, which had asthma-related information on 205 patients. The Random Forest algorithm and Chi-square were used to select the key features from a total of 63 features. Six machine learning models: random forest, extreme gradient boosting, support vector machines, adaptive boosting, extra tree classifier, and logistic regression were then trained and tested using 10-fold stratified cross-validation.RESULTS: Among all features, age, weight, C-reactive protein, eosinophilic granulocytes, oxygen saturation, pre-medication inhaled corticosteroid + long-acting beta2-agonist (PM-ICS + LABA), PM-other (other pre-medication), H-Pulmicort/celestamine (Pulmicort/celestamine during hospitalization), and H-azithromycin (azithromycin during hospitalization) were found to be highly important. The support vector machine approach with a linear kernel was able to diffrentiate between predominantly allergic asthma and non-allergic asthma with higher accuracy (77.8%), precision (0.81), with a true positive rate of 0.73 and a true negative rate of 0.81, a F1 score of 0.81, and a ROC-AUC score of 0.79. Logistic regression was found to be the second-best classifier with an overall accuracy of 76.2%.CONCLUSION: Predominantly allergic and non-allergic asthma can be classified using machine learning approaches based on nine features.

AB - OBJECTIVE: Asthma is the most frequent chronic airway illness in preschool children and is difficult to diagnose due to the disease's heterogeneity. This study aimed to investigate different machine learning models and suggested the most effective one to classify two forms of asthma in preschool children (predominantly allergic asthma and non-allergic asthma) using a minimum number of features.METHODS: After pre-processing, 127 patients (70 with non-allergic asthma and 57 with predominantly allergic asthma) were chosen for final analysis from the Frankfurt dataset, which had asthma-related information on 205 patients. The Random Forest algorithm and Chi-square were used to select the key features from a total of 63 features. Six machine learning models: random forest, extreme gradient boosting, support vector machines, adaptive boosting, extra tree classifier, and logistic regression were then trained and tested using 10-fold stratified cross-validation.RESULTS: Among all features, age, weight, C-reactive protein, eosinophilic granulocytes, oxygen saturation, pre-medication inhaled corticosteroid + long-acting beta2-agonist (PM-ICS + LABA), PM-other (other pre-medication), H-Pulmicort/celestamine (Pulmicort/celestamine during hospitalization), and H-azithromycin (azithromycin during hospitalization) were found to be highly important. The support vector machine approach with a linear kernel was able to diffrentiate between predominantly allergic asthma and non-allergic asthma with higher accuracy (77.8%), precision (0.81), with a true positive rate of 0.73 and a true negative rate of 0.81, a F1 score of 0.81, and a ROC-AUC score of 0.79. Logistic regression was found to be the second-best classifier with an overall accuracy of 76.2%.CONCLUSION: Predominantly allergic and non-allergic asthma can be classified using machine learning approaches based on nine features.

KW - Frankfurt dataset

KW - Machine learning

KW - Non-allergic asthma

KW - Predominantly allergic asthma

U2 - 10.1080/02770903.2022.2059763

DO - 10.1080/02770903.2022.2059763

M3 - Article

C2 - 35344453

SN - 0277-0903

VL - 60

SP - 487

EP - 495

JO - Journal of Asthma

JF - Journal of Asthma

IS - 3

ER -