TY - BOOK
T1 - From sound to meaning
T2 - Brain-inspired deep neural networks for sound recognition
AU - Esposito, Michele
PY - 2025/5/8
Y1 - 2025/5/8
N2 - This thesis investigates how humans recognise and make sense of complex everyday sounds by integrating neuroscience and artificial intelligence (AI) insights. Specifically, it focuses on developing and evaluating deep neural network (DNN) models that simulate how the human brain processes auditory stimuli. The overarching goal is to bridge the gap between biological auditory and artificial hearing systems, using computational models that are both functionally effective and biologically plausible.The first part of the research evaluates various auditory models—including traditional acoustic models and state-of-the-art DNNs—by comparing their ability to predict both brain activity (via functional MRI)iou and human behavral judgments of sound similarity. The results highlight that certain DNNs capture a level of auditory representation between raw acoustic features and abstract semantic categories. This intermediate representation, referred to as “hyperacoustic,” is particularly prominent in the superior temporal gyrus (STG), suggesting it plays a critical role in transforming sound into meaning.The second part introduces semantically informed DNNs (semDNNs), which are trained using continuous semantic embeddings (Word2Vec and BERT) instead of traditional categorical labels with one-hot encoding. These models align more closely with human judgments and offer a cognitively inspired approach to sound categorisation, demonstrating the value of integrating linguistic knowledge into auditory models.Finally, the third part presents a multiscale time-resolved DNN architecture that processes sound at multiple temporal scales and includes attention mechanisms to capture salient auditory events. This design mirrors the human auditory system’s hierarchical and dynamic nature and represents a significant step toward building models that can operate in real-world listening environments.This work contributes to cognitive neuroscience, machine learning, and auditory artificial intelligence by introducing computational models that achieve strong technical performance and provide insights into how humans perceive, interpret, and understand the sounds in their environment.
AB - This thesis investigates how humans recognise and make sense of complex everyday sounds by integrating neuroscience and artificial intelligence (AI) insights. Specifically, it focuses on developing and evaluating deep neural network (DNN) models that simulate how the human brain processes auditory stimuli. The overarching goal is to bridge the gap between biological auditory and artificial hearing systems, using computational models that are both functionally effective and biologically plausible.The first part of the research evaluates various auditory models—including traditional acoustic models and state-of-the-art DNNs—by comparing their ability to predict both brain activity (via functional MRI)iou and human behavral judgments of sound similarity. The results highlight that certain DNNs capture a level of auditory representation between raw acoustic features and abstract semantic categories. This intermediate representation, referred to as “hyperacoustic,” is particularly prominent in the superior temporal gyrus (STG), suggesting it plays a critical role in transforming sound into meaning.The second part introduces semantically informed DNNs (semDNNs), which are trained using continuous semantic embeddings (Word2Vec and BERT) instead of traditional categorical labels with one-hot encoding. These models align more closely with human judgments and offer a cognitively inspired approach to sound categorisation, demonstrating the value of integrating linguistic knowledge into auditory models.Finally, the third part presents a multiscale time-resolved DNN architecture that processes sound at multiple temporal scales and includes attention mechanisms to capture salient auditory events. This design mirrors the human auditory system’s hierarchical and dynamic nature and represents a significant step toward building models that can operate in real-world listening environments.This work contributes to cognitive neuroscience, machine learning, and auditory artificial intelligence by introducing computational models that achieve strong technical performance and provide insights into how humans perceive, interpret, and understand the sounds in their environment.
M3 - Doctoral Thesis
SN - 978-94-6473-803-2
CY - Maastricht
ER -