Skip to main navigation Skip to search Skip to main content

I see what you mean Co-Speech Gestures for Reference Resolution in Multimodal Dialogue

  • Esam Ghaleb*
  • , Bulat Khaertdinov
  • , Asli Özyürek
  • , Raquel Fernández
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingAcademicpeer-review

Abstract

In face-to-face interaction, we use multiple modalities, including speech and gestures, to communicate information and resolve references to objects. However, how representational co-speech gestures refer to objects remains understudied from a computational perspective. In this work, we address this gap by introducing a multimodal reference resolution task centred on representational gestures, while simultaneously tackling the challenge of learning robust gesture embeddings. We propose a self-supervised pre-training approach to gesture representation learning that grounds body movements in spoken language. Our experiments show that the learned embeddings align with expert annotations and have significant predictive power. Moreover, reference resolution accuracy further improves when (1) using multimodal gesture representations, even when speech is unavailable at inference time, and (2) leveraging dialogue history. Overall, our findings highlight the complementary roles of gesture and speech in reference resolution, offering a step towards more naturalistic models of human-machine interaction.
Original languageEnglish
Title of host publicationFindings of the Association for Computational Linguistics: ACL 2025
EditorsWanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
PublisherAssociation for Computational Linguistics (ACL)
Pages13191-13206
Number of pages16
ISBN (Electronic)9798891762565
DOIs
Publication statusPublished - 2025
Event63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025 - Vienna, Austria
Duration: 27 Jul 20251 Aug 2025
https://2025.aclweb.org/

Publication series

SeriesProceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN0736-587X

Conference

Conference63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
Abbreviated titleACL 2025
Country/TerritoryAustria
CityVienna
Period27/07/251/08/25
Internet address

Fingerprint

Dive into the research topics of 'I see what you mean Co-Speech Gestures for Reference Resolution in Multimodal Dialogue'. Together they form a unique fingerprint.

Cite this