Intermediate acoustic-to-semantic representations link behavioral and neural responses to natural sounds

Bruno L. Giordano; Michele Esposito; Giancarlo Valente; Elia Formisano

doi:10.1038/s41593-023-01285-9

Intermediate acoustic-to-semantic representations link behavioral and neural responses to natural sounds

Bruno L. Giordano^*, Michele Esposito, Giancarlo Valente, Elia Formisano^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

Recognizing sounds implicates the cerebral transformation of input waveforms into semantic representations. Although past research identified the superior temporal gyrus (STG) as a crucial cortical region, the computational fingerprint of these cerebral transformations remains poorly characterized. Here, we exploit a model comparison framework and contrasted the ability of acoustic, semantic (continuous and categorical) and sound-to-event deep neural network representation models to predict perceived sound dissimilarity and 7 T human auditory cortex functional magnetic resonance imaging responses. We confirm that spectrotemporal modulations predict early auditory cortex (Heschl's gyrus) responses, and that auditory dimensions (for example, loudness, periodicity) predict STG responses and perceived dissimilarity. Sound-to-event deep neural networks predict Heschl's gyrus responses similar to acoustic models but, notably, they outperform all competing models at predicting both STG responses and perceived dissimilarity. Our findings indicate that STG entails intermediate acoustic-to-semantic sound representations that neither acoustic nor semantic models can account for. These representations are compositional in nature and relevant to behavior.

Original language	English
Pages (from-to)	664-672
Number of pages	13
Journal	Nature Neuroscience
Volume	26
Issue number	4
Early online date	16 Mar 2023
DOIs	https://doi.org/10.1038/s41593-023-01285-9
Publication status	Published - Apr 2023

Access to Document

10.1038/s41593-023-01285-9Licence: CC BY

Cite this

@article{971995c40ec045d59f9862ce1a09424b,

title = "Intermediate acoustic-to-semantic representations link behavioral and neural responses to natural sounds",

abstract = "Recognizing sounds implicates the cerebral transformation of input waveforms into semantic representations. Although past research identified the superior temporal gyrus (STG) as a crucial cortical region, the computational fingerprint of these cerebral transformations remains poorly characterized. Here, we exploit a model comparison framework and contrasted the ability of acoustic, semantic (continuous and categorical) and sound-to-event deep neural network representation models to predict perceived sound dissimilarity and 7 T human auditory cortex functional magnetic resonance imaging responses. We confirm that spectrotemporal modulations predict early auditory cortex (Heschl's gyrus) responses, and that auditory dimensions (for example, loudness, periodicity) predict STG responses and perceived dissimilarity. Sound-to-event deep neural networks predict Heschl's gyrus responses similar to acoustic models but, notably, they outperform all competing models at predicting both STG responses and perceived dissimilarity. Our findings indicate that STG entails intermediate acoustic-to-semantic sound representations that neither acoustic nor semantic models can account for. These representations are compositional in nature and relevant to behavior.",

author = "Giordano, {Bruno L.} and Michele Esposito and Giancarlo Valente and Elia Formisano",

note = "{\textcopyright} 2023. The Author(s).",

year = "2023",

month = apr,

doi = "10.1038/s41593-023-01285-9",

language = "English",

volume = "26",

pages = "664--672",

journal = "Nature Neuroscience",

issn = "1097-6256",

publisher = "Nature Publishing Group",

number = "4",

}

TY - JOUR

T1 - Intermediate acoustic-to-semantic representations link behavioral and neural responses to natural sounds

AU - Giordano, Bruno L.

AU - Esposito, Michele

AU - Valente, Giancarlo

AU - Formisano, Elia

PY - 2023/4

Y1 - 2023/4

N2 - Recognizing sounds implicates the cerebral transformation of input waveforms into semantic representations. Although past research identified the superior temporal gyrus (STG) as a crucial cortical region, the computational fingerprint of these cerebral transformations remains poorly characterized. Here, we exploit a model comparison framework and contrasted the ability of acoustic, semantic (continuous and categorical) and sound-to-event deep neural network representation models to predict perceived sound dissimilarity and 7 T human auditory cortex functional magnetic resonance imaging responses. We confirm that spectrotemporal modulations predict early auditory cortex (Heschl's gyrus) responses, and that auditory dimensions (for example, loudness, periodicity) predict STG responses and perceived dissimilarity. Sound-to-event deep neural networks predict Heschl's gyrus responses similar to acoustic models but, notably, they outperform all competing models at predicting both STG responses and perceived dissimilarity. Our findings indicate that STG entails intermediate acoustic-to-semantic sound representations that neither acoustic nor semantic models can account for. These representations are compositional in nature and relevant to behavior.

AB - Recognizing sounds implicates the cerebral transformation of input waveforms into semantic representations. Although past research identified the superior temporal gyrus (STG) as a crucial cortical region, the computational fingerprint of these cerebral transformations remains poorly characterized. Here, we exploit a model comparison framework and contrasted the ability of acoustic, semantic (continuous and categorical) and sound-to-event deep neural network representation models to predict perceived sound dissimilarity and 7 T human auditory cortex functional magnetic resonance imaging responses. We confirm that spectrotemporal modulations predict early auditory cortex (Heschl's gyrus) responses, and that auditory dimensions (for example, loudness, periodicity) predict STG responses and perceived dissimilarity. Sound-to-event deep neural networks predict Heschl's gyrus responses similar to acoustic models but, notably, they outperform all competing models at predicting both STG responses and perceived dissimilarity. Our findings indicate that STG entails intermediate acoustic-to-semantic sound representations that neither acoustic nor semantic models can account for. These representations are compositional in nature and relevant to behavior.

U2 - 10.1038/s41593-023-01285-9

DO - 10.1038/s41593-023-01285-9

M3 - Article

C2 - 36928634

SN - 1097-6256

VL - 26

SP - 664

EP - 672

JO - Nature Neuroscience

JF - Nature Neuroscience

IS - 4

ER -