Abstract
Deep neural networks (DNNs) for sound recognition learn to categorize a barking sound as a "dog"and a meowing sound as a "cat"but do not exploit information inherent to the semantic relations between classes (e.g., both are animal vocalisations). Cognitive neuroscience research, however, suggests that human listeners automatically exploit higher-level semantic information on the sources besides acoustic information. Inspired by this notion, we introduce here a DNN that learns to recognize sounds and simultaneously learns the semantic relation between the sources (semDNN). Comparison of semDNN with a homologous network trained with categorical labels (catDNN) revealed that semDNN produces semantically more accurate labelling than catDNN in sound recognition tasks and that semDNN-embeddings preserve higherlevel semantic relations between sound sources. Importantly, through a model-based analysis of human dissimilarity ratings of natural sounds, we show that semDNN approximates the behaviour of human listeners better than catDNN and several other DNN and NLP comparison models.
Original language | English |
---|---|
Title of host publication | ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings |
Publisher | IEEE |
ISBN (Electronic) | 9781728163277 |
DOIs | |
Publication status | Published - 5 May 2023 |
Event | 48th IEEE International Conference on Acoustics, Speech and Signal Processing - Rhodes Island, Greece Duration: 4 Jun 2023 → 10 Jun 2023 https://2023.ieeeicassp.org/ |
Publication series
Series | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
---|---|
Volume | 2023-June |
ISSN | 1520-6149 |
Conference
Conference | 48th IEEE International Conference on Acoustics, Speech and Signal Processing |
---|---|
Abbreviated title | ICASSP 2023 |
Country/Territory | Greece |
City | Rhodes Island |
Period | 4/06/23 → 10/06/23 |
Internet address |
Keywords
- acoustic-to-semantic transformation
- auditory semantics
- deep neural networks
- natural sound recognition
- semantic embeddings