ACES: Evaluating Automated Audio Captioning Models on the Semantics of Sounds

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingAcademicpeer-review

Abstract

Automated Audio Captioning is a multimodal task that aims to convert audio content into natural language. The performance of audio captioning systems is evaluated on quantitative metrics applied to the text representations. Previously, researchers have applied metrics from machine translation and image captioning to evaluate a generated audio caption. Inspired by cognitive neuroscience research on auditory cognition, in this paper we present a novel metric approach that evaluates captions taking into account how human listeners derive semantic information from sounds: Audio Captioning Evaluation on Semantics of Sound (ACES).
Original languageEnglish
Title of host publication31st European Signal Processing Conference, EUSIPCO 2023 - Proceedings
PublisherIEEE
Pages770-774
Number of pages5
ISBN (Electronic)9789464593600
DOIs
Publication statusPublished - 1 Jan 2023
Event31st European Signal Processing Conference, EUSIPCO 2023 - Helsinki, Finland
Duration: 4 Sept 20238 Sept 2023
https://eusipco2023.org/

Publication series

SeriesEuropean Signal Processing Conference
ISSN2219-5491

Conference

Conference31st European Signal Processing Conference, EUSIPCO 2023
Abbreviated titleEUSIPCO2023
Country/TerritoryFinland
CityHelsinki
Period4/09/238/09/23
Internet address

Keywords

  • automated audio captioning
  • evaluation metric
  • semantics

Cite this