TY - JOUR
T1 - Enhancing Precision in Detecting Severe Immune-Related Adverse Events
T2 - Comparative Analysis of Large Language Models and International Classification of Disease Codes in Patient Records
AU - Sun, Virginia H
AU - Heemelaar, Julius C
AU - Hadzic, Ibrahim
AU - Raghu, Vineet K
AU - Wu, Chia-Yun
AU - Zubiri, Leyre
AU - Ghamari, Azin
AU - LeBoeuf, Nicole R
AU - Abu-Shawer, Osama
AU - Kehl, Kenneth L
AU - Grover, Shilpa
AU - Singh, Prabhsimranjot
AU - Suero-Abreu, Giselle A
AU - Wu, Jessica
AU - Falade, Ayo S
AU - Grealish, Kelley
AU - Thomas, Molly F
AU - Hathaway, Nora
AU - Medoff, Benjamin D
AU - Gilman, Hannah K
AU - Villani, Alexandra-Chloe
AU - Ho, Jor Sam
AU - Mooradian, Meghan J
AU - Sise, Meghan E
AU - Zlotoff, Daniel A
AU - Blum, Steven M
AU - Dougan, Michael
AU - Sullivan, Ryan J
AU - Neilan, Tomas G
AU - Reynolds, Kerry L
PY - 2024/9/3
Y1 - 2024/9/3
N2 - PURPOSECurrent approaches to accurately identify immune-related adverse events (irAEs) in large retrospective studies are limited. Large language models (LLMs) offer a potential solution to this challenge, given their high performance in natural language comprehension tasks. Therefore, we investigated the use of an LLM to identify irAEs among hospitalized patients, comparing its performance with manual adjudication and International Classification of Disease (ICD) codes.METHODSHospital admissions of patients receiving immune checkpoint inhibitor (ICI) therapy at a single institution from February 5, 2011, to September 5, 2023, were individually reviewed and adjudicated for the presence of irAEs. ICD codes and an LLM with retrieval-augmented generation were applied to detect frequent irAEs (ICI-induced colitis, hepatitis, and pneumonitis) and the most fatal irAE (ICI-myocarditis) from electronic health records. The performance between ICD codes and LLM was compared via sensitivity and specificity with an α =.05, relative to the gold standard of manual adjudication. External validation was performed using a data set of hospital admissions from June 1, 2018, to May 31, 2019, from a second institution.RESULTSOf the 7,555 admissions for patients on ICI therapy in the initial cohort, 2.0% were adjudicated to be due to ICI-colitis, 1.1% ICI-hepatitis, 0.7% ICI-pneumonitis, and 0.8% ICI-myocarditis. The LLM demonstrated higher sensitivity than ICD codes (94.7% v 68.7%), achieving significance for ICI-hepatitis (P <.001), myocarditis (P <.001), and pneumonitis (P =.003) while yielding similar specificities (93.7% v 92.4%). The LLM spent an average of 9.53 seconds/chart in comparison with an estimated 15 minutes for adjudication. In the validation cohort (N = 1,270), the mean LLM sensitivity and specificity were 98.1% and 95.7%, respectively.CONCLUSIONLLMs are a useful tool for the detection of irAEs, outperforming ICD codes in sensitivity and adjudication in efficiency.
AB - PURPOSECurrent approaches to accurately identify immune-related adverse events (irAEs) in large retrospective studies are limited. Large language models (LLMs) offer a potential solution to this challenge, given their high performance in natural language comprehension tasks. Therefore, we investigated the use of an LLM to identify irAEs among hospitalized patients, comparing its performance with manual adjudication and International Classification of Disease (ICD) codes.METHODSHospital admissions of patients receiving immune checkpoint inhibitor (ICI) therapy at a single institution from February 5, 2011, to September 5, 2023, were individually reviewed and adjudicated for the presence of irAEs. ICD codes and an LLM with retrieval-augmented generation were applied to detect frequent irAEs (ICI-induced colitis, hepatitis, and pneumonitis) and the most fatal irAE (ICI-myocarditis) from electronic health records. The performance between ICD codes and LLM was compared via sensitivity and specificity with an α =.05, relative to the gold standard of manual adjudication. External validation was performed using a data set of hospital admissions from June 1, 2018, to May 31, 2019, from a second institution.RESULTSOf the 7,555 admissions for patients on ICI therapy in the initial cohort, 2.0% were adjudicated to be due to ICI-colitis, 1.1% ICI-hepatitis, 0.7% ICI-pneumonitis, and 0.8% ICI-myocarditis. The LLM demonstrated higher sensitivity than ICD codes (94.7% v 68.7%), achieving significance for ICI-hepatitis (P <.001), myocarditis (P <.001), and pneumonitis (P =.003) while yielding similar specificities (93.7% v 92.4%). The LLM spent an average of 9.53 seconds/chart in comparison with an estimated 15 minutes for adjudication. In the validation cohort (N = 1,270), the mean LLM sensitivity and specificity were 98.1% and 95.7%, respectively.CONCLUSIONLLMs are a useful tool for the detection of irAEs, outperforming ICD codes in sensitivity and adjudication in efficiency.
U2 - 10.1200/JCO.24.00326
DO - 10.1200/JCO.24.00326
M3 - Article
SN - 0732-183X
JO - Journal of Clinical Oncology
JF - Journal of Clinical Oncology
M1 - 2400326
ER -