Abstract
Direct synthesis from intracranial brain activity into acoustic speech might provide an intuitive and natural communication means for speech-impaired users. In previous studies we have used logarithmic Mel-scaled speech spectrograms (logMels) as an intermediate representation in the decoding from ElectroCorticoGraphic (ECoG) recordings to an audible waveform. Mel-scaled speech spectrograms have a long tradition in acoustic speech processing and speech synthesis applications. In the past, we relied on regression approaches to find a mapping from brain activity to logMel spectral coefficients, due to the continuous feature space. However, regression tasks are unbounded and thus neuronal fluctuations in brain activity may result in abnormally high amplitudes in a synthesized acoustic speech signal. To mitigate these issues, we propose two methods for quantization of power values to discretize the feature space of logarithmic Mel-scaled spectral coefficients by using the median and the logistic formula, respectively, to reduce the complexity and restricting the number of intervals. We evaluate the practicability in a proof-of-concept with one participant through a simple classification based on linear discriminant analysis and compare the resulting waveform with the original speech. Reconstructed spectrograms achieve Pearson correlation coefficients with a mean of r=0.5 +/- 0.11 in a 5-fold cross validation.
Original language | English |
---|---|
Title of host publication | 21st Annual Conference of the International Speech Communication Association |
Subtitle of host publication | INTERSPEECH 2020 |
Place of Publication | Shanghai |
Publisher | ISCA International Speech Communication Association |
Pages | 2777-2781 |
Number of pages | 5 |
Volume | 2020-October |
DOIs | |
Publication status | Published - 2020 |
Event | Interspeech Conference - Virtual, China Duration: 25 Oct 2020 → 29 Oct 2020 http://www.interspeech2020.org/ |
Publication series
Series | Interspeech |
---|---|
ISSN | 2308-457X |
Conference
Conference | Interspeech Conference |
---|---|
Country/Territory | China |
Period | 25/10/20 → 29/10/20 |
Internet address |
Keywords
- neural signals for spoken communication
- speech synthesis
- electrocorticography
- BCI
- COMPUTER-INTERFACE
- GAMMA ACTIVITY
- SPOKEN
- COMMUNICATION