Abstract
Large language models (LLMs) are increasingly essential in processing natural languages, yet their application is frequently compromised by biases and inaccuracies originating in their training data. In this study, we introduce Cross-Care, the first benchmark framework dedicated to assessing biases and real world knowledge in LLMs, specifically focusing on the representation of disease prevalence across diverse demographic groups. We systematically evaluate how demographic biases embedded in pre-training corpora like ThePile influence the outputs of LLMs. We expose and quantify discrepancies by juxtaposing these biases against actual disease prevalences in various U.S. demographic groups. Our results highlight substantial misalignment between LLM representation of disease prevalence and real disease prevalence rates across demographic subgroups, indicating a pronounced risk of bias propagation and a lack of real-world grounding for medical applications of LLMs. Furthermore, we observe that various alignment methods minimally resolve inconsistencies in the models' representation of disease prevalence across different languages. For further exploration and analysis, we make all data and a data visualization tool available at: www.crosscare.net.
Original language | English |
---|---|
Title of host publication | 38th Conference on Neural Information Processing Systems |
Subtitle of host publication | NeurIPS 2024 |
Editors | A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, C. Zhang |
Place of Publication | Vancouver |
Publisher | Neural Information Processing Systems Foundation |
Volume | 37 |
ISBN (Print) | 10495258 |
Publication status | Published - 1 Jan 2024 |
Event | 38th Conference on Neural Information Processing Systems, NeurIPS 2024 - Vancouver, Canada Duration: 10 Dec 2024 → 15 Dec 2024 Conference number: 38 https://neurips.cc/Conferences/2024 |
Publication series
Series | Advances in Neural Information Processing Systems |
---|---|
ISSN | 1049-5258 |
Conference
Conference | 38th Conference on Neural Information Processing Systems, NeurIPS 2024 |
---|---|
Abbreviated title | NeurIPS 2024 |
Country/Territory | Canada |
City | Vancouver |
Period | 10/12/24 → 15/12/24 |
Internet address |