TY - JOUR
T1 - Dynamic time-locking mechanism in the cortical representation of spoken words
AU - Nora, A
AU - Faisal, A
AU - Seol, J
AU - Renvall, H.
AU - Formisano, E.
AU - Salmelin, R
N1 - Funding Information:
This work was supported by the Academy of Finland Grants 255349, 256887, 292552, and 315553 (to R.S.) and 277655 (to H.R.); the Finnish Cultural Foundation (H.R.); the Sigrid Jusélius Foundation (R.S.); Maastricht University (E.F.); the Dutch Province of Limburg (E.F.); the Netherlands Organization for Scientific Research (NOW) Grant 453-12-002 (to E.F.); the Doctoral Program Brain and Mind (A.N.), the Foundation for Aalto University Science and Technology (A.N.); and the Emil Aaltonen Foundation (A.N.). pA.N. and A.F. contributed equally to this work.
Funding Information:
This work was supported by the Academy of Finland Grants 255349, 256887, 292552, and 315553 (to R.S.) and 277655 (to H.R.); the Finnish Cultural Foundation (H.R.); the Sigrid Jus?lius Foundation (R.S.); Maastricht University (E.F.); the Dutch Province of Limburg (E.F.); the Netherlands Organization for Scientific Research (NOW) Grant 453-12-002 (to E.F.); the Doctoral Program Brain and Mind (A.N.), the Foundation for Aalto University Science and Technology (A.N.); and the Emil Aaltonen Foundation (A.N.).We thank Mia Liljestr?m and Pekka Laitio for providing customized code for the source parcellation, Ossi Lehtonen for assisting with the source modeling, Jan Kujala with assistance in sound feature analysis, Tiina Lindh-Knuutila for assistance with the corpus vectors, and Sasa Kivisaari for comments on this manuscript.
Publisher Copyright:
© 2020 Nora et al.
PY - 2020/7/1
Y1 - 2020/7/1
N2 - Human speech has a unique capacity to carry and communicate rich meanings. However, it is not known how the highly dynamic and variable perceptual signal is mapped to existing linguistic and semantic representations. In this novel approach, we utilized the natural acoustic variability of sounds and mapped them to magnetoencephalography (MEG) data using physiologically-inspired machine-learning models. We aimed at determining how well the models, differing in their representation of temporal information, serve to decode and reconstruct spoken words from MEG recordings in 16 healthy volunteers. We discovered that dynamic time-locking of the cortical activation to the unfolding speech input is crucial for the encoding of the acoustic-phonetic features of speech. In contrast, time-locking was not highlighted in cortical processing of non-speech environmental sounds that conveyed the same meanings as the spoken words, including human-made sounds with temporal modulation content similar to speech. The amplitude envelope of the spoken words was particularly well reconstructed based on cortical evoked responses. Our results indicate that speech is encoded cortically with especially high temporal fidelity. This speech tracking by evoked responses may partly reflect the same underlying neural mechanism as the frequently reported entrainment of the cortical oscillations to the amplitude envelope of speech. Furthermore, the phoneme content was reflected in cortical evoked responses simultaneously with the spectrotemporal features, pointing to an instantaneous transformation of the unfolding acoustic features into linguistic representations during speech processing.Significance statement It has remained unclear how speech is processed differently from other sounds with comparable meanings and spectrotemporal characteristics. In this study, computational modeling of cortical responses to spoken words highlights the relevance of temporal tracking of spectrotemporal features especially for speech. This mechanism is likely pivotal for transforming the acoustic-phonetic features into linguistic representations.
AB - Human speech has a unique capacity to carry and communicate rich meanings. However, it is not known how the highly dynamic and variable perceptual signal is mapped to existing linguistic and semantic representations. In this novel approach, we utilized the natural acoustic variability of sounds and mapped them to magnetoencephalography (MEG) data using physiologically-inspired machine-learning models. We aimed at determining how well the models, differing in their representation of temporal information, serve to decode and reconstruct spoken words from MEG recordings in 16 healthy volunteers. We discovered that dynamic time-locking of the cortical activation to the unfolding speech input is crucial for the encoding of the acoustic-phonetic features of speech. In contrast, time-locking was not highlighted in cortical processing of non-speech environmental sounds that conveyed the same meanings as the spoken words, including human-made sounds with temporal modulation content similar to speech. The amplitude envelope of the spoken words was particularly well reconstructed based on cortical evoked responses. Our results indicate that speech is encoded cortically with especially high temporal fidelity. This speech tracking by evoked responses may partly reflect the same underlying neural mechanism as the frequently reported entrainment of the cortical oscillations to the amplitude envelope of speech. Furthermore, the phoneme content was reflected in cortical evoked responses simultaneously with the spectrotemporal features, pointing to an instantaneous transformation of the unfolding acoustic features into linguistic representations during speech processing.Significance statement It has remained unclear how speech is processed differently from other sounds with comparable meanings and spectrotemporal characteristics. In this study, computational modeling of cortical responses to spoken words highlights the relevance of temporal tracking of spectrotemporal features especially for speech. This mechanism is likely pivotal for transforming the acoustic-phonetic features into linguistic representations.
U2 - 10.1523/ENEURO.0475-19.2020
DO - 10.1523/ENEURO.0475-19.2020
M3 - Article
C2 - 32513662
SN - 2373-2822
VL - 7
JO - eNeuro
JF - eNeuro
IS - 4
M1 - 0475-19
ER -