Comparing the performance of a large language model and naive human interviewers in interviewing children about a witnessed mock-event

Yongjie Sun*, Haohai Pang, Liisa Jaervilehto, Ophelia Zhang, David Shapiro, Julia Korkman, Shumpei Haginoya, Pekka Santtila

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Purpose The present study compared the performance of a Large Language Model (LLM; ChatGPT) and human interviewers in interviewing children about a mock-event they witnessed.Methods Children aged 6-8 (N = 78) were randomly assigned to the LLM (n = 40) or the human interviewer condition (n = 38). In the experiment, the children were asked to watch a video filmed by the researchers that depicted behavior including elements that could be misinterpreted as abusive in other contexts, and then answer questions posed by either an LLM (presented by a human researcher) or a human interviewer.Results Irrespective of condition, recommended (vs. not recommended) questions elicited more correct information. The LLM posed fewer questions overall, but no difference in the proportion of the questions recommended by the literature. There were no differences between the LLM and human interviewers in unique correct information elicited but questions posed by LLM (vs. humans) elicited more unique correct information per question. LLM (vs. humans) also elicited less false information overall, but there was no difference in false information elicited per question.Conclusions The findings show that the LLM was competent in formulating questions that adhere to best practice guidelines while human interviewers asked more questions following up on the child responses in trying to find out what the children had witnessed. The results indicate LLMs could possibly be used to support child investigative interviewers. However, substantial further investigation is warranted to ascertain the utility of LLMs in more realistic investigative interview settings.
Original languageEnglish
Article numbere0316317
Number of pages25
JournalPLOS ONE
Volume20
Issue number2
DOIs
Publication statusPublished - 28 Feb 2025

Keywords

  • PSYCHOLOGICAL REFRACTORY PERIOD
  • SEXUAL-ABUSE INTERVIEWS
  • INVESTIGATIVE INTERVIEWS
  • INDIVIDUAL-DIFFERENCES
  • QUESTIONING STYLE
  • COGNITIVE LOAD
  • PROTOCOL
  • MEMORY
  • PROMPTS
  • FEEDBACK

Fingerprint

Dive into the research topics of 'Comparing the performance of a large language model and naive human interviewers in interviewing children about a witnessed mock-event'. Together they form a unique fingerprint.

Cite this