28 Downloads (Pure)

Abstract

Deep neural networks (DNNs) for sound recognition learn to categorize a barking sound as a "dog"and a meowing sound as a "cat"but do not exploit information inherent to the semantic relations between classes (e.g., both are animal vocalisations). Cognitive neuroscience research, however, suggests that human listeners automatically exploit higher-level semantic information on the sources besides acoustic information. Inspired by this notion, we introduce here a DNN that learns to recognize sounds and simultaneously learns the semantic relation between the sources (semDNN). Comparison of semDNN with a homologous network trained with categorical labels (catDNN) revealed that semDNN produces semantically more accurate labelling than catDNN in sound recognition tasks and that semDNN-embeddings preserve higherlevel semantic relations between sound sources. Importantly, through a model-based analysis of human dissimilarity ratings of natural sounds, we show that semDNN approximates the behaviour of human listeners better than catDNN and several other DNN and NLP comparison models.
Original languageEnglish
Title of host publicationICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings
PublisherIEEE
ISBN (Electronic)9781728163277
DOIs
Publication statusPublished - 5 May 2023
Event48th IEEE International Conference on Acoustics, Speech and Signal Processing - Rhodes Island, Greece
Duration: 4 Jun 202310 Jun 2023
https://2023.ieeeicassp.org/

Publication series

SeriesICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2023-June
ISSN1520-6149

Conference

Conference48th IEEE International Conference on Acoustics, Speech and Signal Processing
Abbreviated titleICASSP 2023
Country/TerritoryGreece
CityRhodes Island
Period4/06/2310/06/23
Internet address

Keywords

  • acoustic-to-semantic transformation
  • auditory semantics
  • deep neural networks
  • natural sound recognition
  • semantic embeddings

Fingerprint

Dive into the research topics of 'Semantically-Informed Deep Neural Networks For Sound Recognition'. Together they form a unique fingerprint.

Cite this