Intermediate acoustic-to-semantic representations link behavioral and neural responses to natural sounds

Bruno L. Giordano*, Michele Esposito, Giancarlo Valente, Elia Formisano*

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Recognizing sounds implicates the cerebral transformation of input waveforms into semantic representations. Although past research identified the superior temporal gyrus (STG) as a crucial cortical region, the computational fingerprint of these cerebral transformations remains poorly characterized. Here, we exploit a model comparison framework and contrasted the ability of acoustic, semantic (continuous and categorical) and sound-to-event deep neural network representation models to predict perceived sound dissimilarity and 7 T human auditory cortex functional magnetic resonance imaging responses. We confirm that spectrotemporal modulations predict early auditory cortex (Heschl's gyrus) responses, and that auditory dimensions (for example, loudness, periodicity) predict STG responses and perceived dissimilarity. Sound-to-event deep neural networks predict Heschl's gyrus responses similar to acoustic models but, notably, they outperform all competing models at predicting both STG responses and perceived dissimilarity. Our findings indicate that STG entails intermediate acoustic-to-semantic sound representations that neither acoustic nor semantic models can account for. These representations are compositional in nature and relevant to behavior.

Original languageEnglish
Pages (from-to)664-672
Number of pages13
JournalNature Neuroscience
Volume26
Issue number4
Early online date16 Mar 2023
DOIs
Publication statusPublished - Apr 2023

Cite this