Deciphering the transformation of sounds into meaning: Insights from disentangling intermediate representations in sound-to-event DNNs

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Neural representations estimated from functional MRI (fMRI) responses to natural sounds in non-primary auditory cortical areas resemble those in intermediate layers of deep neural networks (DNNs) trained to recognize sounds. However, the nature of these representations remains poorly understood. In the current study, a convolutional DNN (YAMNet), pre-trained to map sound spectrograms to semantic categories, is used as a computer simulation of the human brain's processing of natural sounds. A novel sound dataset is introduced and employed to test the hypothesis that sound-to-event DNNs represent basic mechanisms of sound generation (here, human actions) and physical properties of the sources (here, object materials) in their intermediate layers. Systematic changes to those latent representations are made with the help of a disentangling flow model. The manipulations are shown to cause a predictable effect on DNN's semantic output. By demonstrating this mechanism in silico, the current study paves the way for neuroscientific experiments aiming to verify it in vivo. Code available at https://github. com/TimHenry1995/LatentAudio.
Original languageEnglish
Article number131600
Number of pages10
JournalNeurocomputing
Volume658
DOIs
Publication statusPublished - 28 Dec 2025

Keywords

  • Machine learning
  • Latent space disentanglement
  • YAMNet
  • Auditory processing
  • Invertible neural network
  • Normalizing flow
  • AUDITORY-CORTEX
  • SEMANTIC REPRESENTATIONS
  • DENSITY-ESTIMATION
  • MAPS
  • NETWORK
  • COMPLEX

Fingerprint

Dive into the research topics of 'Deciphering the transformation of sounds into meaning: Insights from disentangling intermediate representations in sound-to-event DNNs'. Together they form a unique fingerprint.

Cite this