INTRODUCTION: Several factors affect the survival of End Stage Kidney Disease (ESKD) patients on dialysis. Machine learning (ML) models may help tackle multivariable and complex, often non-linear predictors of adverse clinical events in ESKD patients. In this study, we used advanced ML method as well as a traditional statistical method to develop and compare the risk factors for mortality prediction model in hemodialysis (HD) patients.
MATERIALS AND METHODS: We included data HD patients who had data across a baseline period of at least 1 year and 1 day in the internationally representative Monitoring Dialysis Outcomes (MONDO) Initiative dataset. Twenty-three input parameters considered in the model were chosen in an a priori manner. The prediction model used 1 year baseline data to predict death in the following 3 years. The dataset was randomly split into 80% training data and 20% testing data for model development. Two different modeling techniques were used to build the mortality prediction model.
FINDINGS: A total of 95,142 patients were included in the analysis sample. The area under the receiver operating curve (AUROC) of the model on the test data with XGBoost ML model was 0.84 on the training data and 0.80 on the test data. AUROC of the logistic regression model was 0.73 on training data and 0.75 on test data. Four out of the top five predictors were common to both modeling strategies.
DISCUSSION: In the internationally representative MONDO data for HD patients, we describe the development of a ML model and a traditional statistical model that was suitable for classification of a prevalent HD patient's 3-year risk of death. While both models had a reasonably high AUROC, the ML model was able to identify levels of hematocrit (HCT) as an important risk factor in mortality. If implemented in clinical practice, such proof-of-concept models could be used to provide pre-emptive care for HD patients.
|Number of pages||12|
|Early online date||20 Nov 2022|
|Publication status||Published - Jan 2023|