Background: We developed a machine learning (ML) model that predicts the risk of a patient on hemodialysis (HD) having an undetected SARS-CoV-2 infection that is identified after the following ≥3 days.
Methods: As part of a healthcare operations effort, we used patient data from a national network of dialysis clinics (February-September 2020) to develop an ML model (XGBoost) that uses 81 variables to predict the likelihood of an adult patient on HD having an undetected SARS-CoV-2 infection that is identified in the subsequent ≥3 days. We used a 60%:20%:20% randomized split of COVID-19-positive samples for the training, validation, and testing datasets.
Results: We used a select cohort of 40,490 patients on HD to build the ML model (11,166 patients who were COVID-19 positive and 29,324 patients who were unaffected controls). The prevalence of COVID-19 in the cohort (28% COVID-19 positive) was by design higher than the HD population. The prevalence of COVID-19 was set to 10% in the testing dataset to estimate the prevalence observed in the national HD population. The threshold for classifying observations as positive or negative was set at 0.80 to minimize false positives. Precision for the model was 0.52, the recall was 0.07, and the lift was 5.3 in the testing dataset. Area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC) for the model was 0.68 and 0.24 in the testing dataset, respectively. Top predictors of a patient on HD having a SARS-CoV-2 infection were the change in interdialytic weight gain from the previous month, mean pre-HD body temperature in the prior week, and the change in post-HD heart rate from the previous month.
Conclusions: The developed ML model appears suitable for predicting patients on HD at risk of having COVID-19 at least 3 days before there would be a clinical suspicion of the disease.
|Number of pages||13|
|Publication status||Published - 25 Mar 2021|
- Machine Learning
- ROC Curve
- Renal Dialysis