Reliability of Large Language Model Knowledge Across Brand and Generic Cancer Drug Names

Jack Gallifant, Shan Chen, Sandeep K. Jain, Pedro Moreira, Umit Topaloglu, Hugo J.W.L. Aerts, Jeremy L. Warner, William G. La Cava, Danielle S. Bitterman*

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

1 Downloads (Pure)

Abstract

PURPOSETo evaluate the performance and consistency of large language models (LLMs) across brand and generic oncology drug names in various clinical tasks, addressing concerns about potential fluctuations in LLM performance because of subtle phrasing differences that could affect patient care.METHODSThis study evaluated three LLMs (GPT-3.5-turbo-0125, GPT-4-turbo, and GPT-4o) using drug names from HemOnc ontology. The assessment included 367 generic-to-brand and 2,516 brand-to-generic pairs, 1,000 drug-drug interaction (DDI) synthetic patient cases, and 2,438 immune-related adverse event (irAE) cases. LLMs were tested on drug name recognition, word association, DDI (DDI) detection, and irAE diagnosis using both brand and generic drug names.RESULTSLLMs demonstrated high accuracy in matching brand and generic names (GPT-4o: 97.38% for brand, 94.71% for generic, P <.01). However, they showed significant inconsistencies in word association tasks. GPT-3.5-turbo-0125 exhibited biases favoring brand names for effectiveness (odds ratio [OR], 1.43, P <.05) and being side-effect-free (OR, 1.76, P <.05). DDI detection accuracy was poor across all models (<26%), with no significant differences between brand and generic names. Sentiment analysis revealed significant differences, particularly in GPT-3.5-turbo-0125 (brand mean 0.67, generic mean 0.95, P <.01). Consistency in irAE diagnosis varied across models.CONCLUSIONDespite high proficiency in name-matching, LLMs exhibit inconsistencies when processing brand versus generic drug names in more complex tasks. These findings highlight the need for increased awareness, improved robustness assessment methods, and the development of more consistent systems for handling nomenclature variations in clinical applications of LLMs.
Original languageEnglish
Article numbere2400257
JournalJCO Clinical Cancer Informatics
Volume9
DOIs
Publication statusPublished - 1 Jun 2025

Fingerprint

Dive into the research topics of 'Reliability of Large Language Model Knowledge Across Brand and Generic Cancer Drug Names'. Together they form a unique fingerprint.

Cite this