TY - JOUR
T1 - Do Large Language Model Chatbots perform better than established patient information resources in answering patient questions? A comparative study on melanoma
AU - Kamminga, Nadia Cw
AU - Kievits, June Ec
AU - Plaisier, Peter W
AU - Burgers, Jake S
AU - van der Veldt, Astrid M
AU - van den Brand, J A G J
AU - Mulder, M
AU - Wakkee, Marlies
AU - Lugtenberg, Marjolein
AU - Nijsten, Tamar
PY - 2024
Y1 - 2024
N2 - BACKGROUND: Large Language Models (LLMs) have a potential role in providing adequate patient information. OBJECTIVES: To compare the quality of LLMs' responses with established Dutch patient information resources (PIRs) in answering patient questions regarding melanoma. METHODS: Responses from ChatGPT versions 3.5 and 4.0, Gemini, and three leading Dutch melanoma PIRs to 50 melanoma-specific questions were examined at baseline and for LLMs again after eight months. Outcomes included (medical) accuracy, completeness, personalisation, readability, and additionally reproducibility for LLMs. Comparative analyses were performed within LLMs and PIRs using Friedman's ANOVA, and between best-performing LLMs and gold-standard PIR using Wilcoxon Signed Ranks test. RESULTS: Within LLMs, ChatGPT-3.5 demonstrated the highest accuracy (p=0.009). Gemini performed best in completeness (p<0.001), personalisation (p=0.007), and readability (p<0.001). PIRs were consistent in accuracy and completeness, with the general practitioner's website excelling in personalisation (p=0.013) and readability (p<0.001). The best-performing LLMs outperformed the gold-standard PIR on all criteria except accuracy. Over time, response reproducibility decreased for all LLMs, showing variability across outcomes. CONCLUSIONS: Although LLMs show potential in providing highly personalised and complete responses to patient questions regarding melanoma, improving and safeguarding accuracy, reproducibility and accessibility is crucial before they can replace or complement conventional PIRs.This study compared the quality of responses from Large Language Models (LLMs) with established Dutch patient information resources (PIRs) for melanoma-related patient questions. Results showed LLMs provided highly personalised and complete answers, often surpassing PIRs. However, improving and safeguarding accuracy, reproducibility and accessibility is crucial before they can replace or complement conventional PIRs.
AB - BACKGROUND: Large Language Models (LLMs) have a potential role in providing adequate patient information. OBJECTIVES: To compare the quality of LLMs' responses with established Dutch patient information resources (PIRs) in answering patient questions regarding melanoma. METHODS: Responses from ChatGPT versions 3.5 and 4.0, Gemini, and three leading Dutch melanoma PIRs to 50 melanoma-specific questions were examined at baseline and for LLMs again after eight months. Outcomes included (medical) accuracy, completeness, personalisation, readability, and additionally reproducibility for LLMs. Comparative analyses were performed within LLMs and PIRs using Friedman's ANOVA, and between best-performing LLMs and gold-standard PIR using Wilcoxon Signed Ranks test. RESULTS: Within LLMs, ChatGPT-3.5 demonstrated the highest accuracy (p=0.009). Gemini performed best in completeness (p<0.001), personalisation (p=0.007), and readability (p<0.001). PIRs were consistent in accuracy and completeness, with the general practitioner's website excelling in personalisation (p=0.013) and readability (p<0.001). The best-performing LLMs outperformed the gold-standard PIR on all criteria except accuracy. Over time, response reproducibility decreased for all LLMs, showing variability across outcomes. CONCLUSIONS: Although LLMs show potential in providing highly personalised and complete responses to patient questions regarding melanoma, improving and safeguarding accuracy, reproducibility and accessibility is crucial before they can replace or complement conventional PIRs.This study compared the quality of responses from Large Language Models (LLMs) with established Dutch patient information resources (PIRs) for melanoma-related patient questions. Results showed LLMs provided highly personalised and complete answers, often surpassing PIRs. However, improving and safeguarding accuracy, reproducibility and accessibility is crucial before they can replace or complement conventional PIRs.
U2 - 10.1093/bjd/ljae377
DO - 10.1093/bjd/ljae377
M3 - Article
SN - 0007-0963
JO - British Journal of Dermatology
JF - British Journal of Dermatology
M1 - ljae377
ER -