Can ChatGPT provide responses to patients for orthopaedic-related questions? A comparison between ChatGPT and medical support staff

Maud Jacobs, Walter van der Weegen*, Hans Savelberg, Rob de Bie, Rienk van Beek, Joost Kuipers, Peter Pilot

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Introduction: Patient Engagement Platforms, particularly chat functionalities, potentially improve communication but may also heighten workload, contributing to burnout among healthcare professionals. Natural Language Processing advancements, like ChatGPT and Med-PaLM, offer human-like responses to various questions, but concerns about their use in healthcare remain. This study evaluates whether Large Language Models can respond to patient questions as well as support staff in terms of quality and empathy. Methods: In this cross-sectional study, 111 patient questions on lower limb arthroplasty, answered by support staff via an app, were selected. These questions were put into ChatGPT 3.5 to generate responses, and were collected on July 2 and 3, 2024. Two blinded healthcare professionals, an orthopaedic surgeon and an anesthetist, evaluated both the responses generated by ChatGPT and support staff, on quality, empathy, and risk of potential adverse events, selecting their preferred responses and identifying what they thought was ChatGPT's response. A Patient Panel (n = 29) also assessed responses on empathy, preference, and source of the responses. Results: Fifty questions were available for a comparative analysis between ChatGPT and support staff responses. No quality difference was found (p = 0.075) between ChatGPT and support staff, though ChatGPT was rated as more empathetic (p < 0.001). No difference was found between the two responses in the risk of incorrect treatment (p = 0.377). Physicians identified ChatGPT's responses in 84–90 % of cases. The Patient Panel found ChatGPT to be more empathetic (p < 0.001) but showed no preference for ChatGPT (p = 0.086). Patients accurately identified ChatGPT's responses in 34.5 % of cases (p = 0.005). Three ChatGPT responses showed high-risk errors. Conclusion: This study shows ChatGPT generated high quality and empathetic responses to patient questions about lower limb arthroplasty. Further investigation is needed to optimize clinical use, but high appreciation for ChatGPT responses highlights the potential for use in clinical practice in the near future.

Original languageEnglish
Article number109333
Number of pages13
JournalPatient Education and Counseling
Volume142
DOIs
Publication statusPublished - Jan 2026

Keywords

  • Artificial intelligence
  • ChatGPT
  • Large language model
  • Lower limb arthroplasty

Fingerprint

Dive into the research topics of 'Can ChatGPT provide responses to patients for orthopaedic-related questions? A comparison between ChatGPT and medical support staff'. Together they form a unique fingerprint.

Cite this