TY - JOUR
T1 - Can ChatGPT provide responses to patients for orthopaedic-related questions? A comparison between ChatGPT and medical support staff
AU - Jacobs, Maud
AU - van der Weegen, Walter
AU - Savelberg, Hans
AU - de Bie, Rob
AU - van Beek, Rienk
AU - Kuipers, Joost
AU - Pilot, Peter
PY - 2026/1
Y1 - 2026/1
N2 - Introduction: Patient Engagement Platforms, particularly chat functionalities, potentially improve communication but may also heighten workload, contributing to burnout among healthcare professionals. Natural Language Processing advancements, like ChatGPT and Med-PaLM, offer human-like responses to various questions, but concerns about their use in healthcare remain. This study evaluates whether Large Language Models can respond to patient questions as well as support staff in terms of quality and empathy. Methods: In this cross-sectional study, 111 patient questions on lower limb arthroplasty, answered by support staff via an app, were selected. These questions were put into ChatGPT 3.5 to generate responses, and were collected on July 2 and 3, 2024. Two blinded healthcare professionals, an orthopaedic surgeon and an anesthetist, evaluated both the responses generated by ChatGPT and support staff, on quality, empathy, and risk of potential adverse events, selecting their preferred responses and identifying what they thought was ChatGPT's response. A Patient Panel (n = 29) also assessed responses on empathy, preference, and source of the responses. Results: Fifty questions were available for a comparative analysis between ChatGPT and support staff responses. No quality difference was found (p = 0.075) between ChatGPT and support staff, though ChatGPT was rated as more empathetic (p < 0.001). No difference was found between the two responses in the risk of incorrect treatment (p = 0.377). Physicians identified ChatGPT's responses in 84–90 % of cases. The Patient Panel found ChatGPT to be more empathetic (p < 0.001) but showed no preference for ChatGPT (p = 0.086). Patients accurately identified ChatGPT's responses in 34.5 % of cases (p = 0.005). Three ChatGPT responses showed high-risk errors. Conclusion: This study shows ChatGPT generated high quality and empathetic responses to patient questions about lower limb arthroplasty. Further investigation is needed to optimize clinical use, but high appreciation for ChatGPT responses highlights the potential for use in clinical practice in the near future.
AB - Introduction: Patient Engagement Platforms, particularly chat functionalities, potentially improve communication but may also heighten workload, contributing to burnout among healthcare professionals. Natural Language Processing advancements, like ChatGPT and Med-PaLM, offer human-like responses to various questions, but concerns about their use in healthcare remain. This study evaluates whether Large Language Models can respond to patient questions as well as support staff in terms of quality and empathy. Methods: In this cross-sectional study, 111 patient questions on lower limb arthroplasty, answered by support staff via an app, were selected. These questions were put into ChatGPT 3.5 to generate responses, and were collected on July 2 and 3, 2024. Two blinded healthcare professionals, an orthopaedic surgeon and an anesthetist, evaluated both the responses generated by ChatGPT and support staff, on quality, empathy, and risk of potential adverse events, selecting their preferred responses and identifying what they thought was ChatGPT's response. A Patient Panel (n = 29) also assessed responses on empathy, preference, and source of the responses. Results: Fifty questions were available for a comparative analysis between ChatGPT and support staff responses. No quality difference was found (p = 0.075) between ChatGPT and support staff, though ChatGPT was rated as more empathetic (p < 0.001). No difference was found between the two responses in the risk of incorrect treatment (p = 0.377). Physicians identified ChatGPT's responses in 84–90 % of cases. The Patient Panel found ChatGPT to be more empathetic (p < 0.001) but showed no preference for ChatGPT (p = 0.086). Patients accurately identified ChatGPT's responses in 34.5 % of cases (p = 0.005). Three ChatGPT responses showed high-risk errors. Conclusion: This study shows ChatGPT generated high quality and empathetic responses to patient questions about lower limb arthroplasty. Further investigation is needed to optimize clinical use, but high appreciation for ChatGPT responses highlights the potential for use in clinical practice in the near future.
KW - Artificial intelligence
KW - ChatGPT
KW - Large language model
KW - Lower limb arthroplasty
U2 - 10.1016/j.pec.2025.109333
DO - 10.1016/j.pec.2025.109333
M3 - Article
SN - 0738-3991
VL - 142
JO - Patient Education and Counseling
JF - Patient Education and Counseling
M1 - 109333
ER -