Identifying type 1 / type 2 diabetes in medico-administrative database to improve health surveillance, medical research and prevention in diabetes: Algorithm development and application

Sonsoles Fuentes, Rok Hrzic, Romana Haneef, Sofiane Kab, Emmanuel Cosson, Sandrine Fosse-Edorh*

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Introduction: Big data sources represent an opportunity for diabetes research. One example is the French national health data system (SNDS), gathering information on medical claims of out-of-hospital health care and hospitalizations for the entire French population (66 million). Currently, a validated algorithm based on antidiabetic drug reimbursement is able to identify people with pharmacologically-treated diabetes in the SNDS. But it cannot distinguish type 1 from type 2 diabetes. Differentiating type 1 and type 2 diabetes is crucial in diabetes surveillance, because they carry differences in their prevention, populations at risk, disease natural history, pathophysiology, management and risk of complications. This article investigates the development of a type 1/type 2 diabetes classification algorithm using artificial intelligence and its application to estimate the prevalence of type 1 and type 2 diabetes in France. Methods: The final data set comprised all diabetes cases from the CONSTANCES cohort (n = 951). A supervised machine learning method based on eight steps was used: final data set selection, target definition (type 1), coding features, final data set splitting into training and testing data sets, feature selection and training and validation and selection of algorithms. The selected algorithm was applied to SNDS data to estimate the type 1 and type 2 diabetes prevalence among adults 18–70 years of age. Results: Among the 3481 SNDS features, 14 were selected to train the different algorithms. The final algorithm was a linear discriminant analysis model based on the number of reimbursements for fast-acting insulin, long-acting insulin and biguanides over the previous year (specificity 97% and sensitivity 100%). In 2016, after adjusting for algorithm performance, type 1 and type 2 diabetes prevalence in France was estimated to be 0.3% and 4.4%, respectively. Conclusion: Our type 1/type 2 classification algorithm was found to perform well and to be applicable to any prescription or medical claims database from other countries. Artificial intelligence opens new possibilities for research and diabetes prevention.
Original languageEnglish
Article number100137
Number of pages7
JournalDiabetes Epidemiology and Management
Volume10
Issue number1
DOIs
Publication statusPublished - 1 Apr 2023

Keywords

  • Algorithm
  • Machine learning
  • Type 1 diabetes
  • Type 2 diabetes and prevalence

Cite this