Abstract
Automated Audio Captioning is a multimodal task that aims to convert audio content into natural language. The performance of audio captioning systems is evaluated on quantitative metrics applied to the text representations. Previously, researchers have applied metrics from machine translation and image captioning to evaluate a generated audio caption. Inspired by cognitive neuroscience research on auditory cognition, in this paper we present a novel metric approach that evaluates captions taking into account how human listeners derive semantic information from sounds: Audio Captioning Evaluation on Semantics of Sound (ACES).
Original language | English |
---|---|
Title of host publication | 31st European Signal Processing Conference, EUSIPCO 2023 - Proceedings |
Publisher | IEEE |
Pages | 770-774 |
Number of pages | 5 |
ISBN (Electronic) | 9789464593600 |
DOIs | |
Publication status | Published - 1 Jan 2023 |
Event | 31st European Signal Processing Conference, EUSIPCO 2023 - Helsinki, Finland Duration: 4 Sept 2023 → 8 Sept 2023 https://eusipco2023.org/ |
Publication series
Series | European Signal Processing Conference |
---|---|
ISSN | 2219-5491 |
Conference
Conference | 31st European Signal Processing Conference, EUSIPCO 2023 |
---|---|
Abbreviated title | EUSIPCO2023 |
Country/Territory | Finland |
City | Helsinki |
Period | 4/09/23 → 8/09/23 |
Internet address |
Keywords
- automated audio captioning
- evaluation metric
- semantics