Speech Spectrogram Estimation from Intracranial Brain Activity using a Quantization Approach

M. Angrick*, C. Herff, G. Johnson, J. Shih, D. Krusienski, T. Schultz

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingAcademicpeer-review

2 Downloads (Pure)

Abstract

Direct synthesis from intracranial brain activity into acoustic speech might provide an intuitive and natural communication means for speech-impaired users. In previous studies we have used logarithmic Mel-scaled speech spectrograms (logMels) as an intermediate representation in the decoding from ElectroCorticoGraphic (ECoG) recordings to an audible waveform. Mel-scaled speech spectrograms have a long tradition in acoustic speech processing and speech synthesis applications. In the past, we relied on regression approaches to find a mapping from brain activity to logMel spectral coefficients, due to the continuous feature space. However, regression tasks are unbounded and thus neuronal fluctuations in brain activity may result in abnormally high amplitudes in a synthesized acoustic speech signal. To mitigate these issues, we propose two methods for quantization of power values to discretize the feature space of logarithmic Mel-scaled spectral coefficients by using the median and the logistic formula, respectively, to reduce the complexity and restricting the number of intervals. We evaluate the practicability in a proof-of-concept with one participant through a simple classification based on linear discriminant analysis and compare the resulting waveform with the original speech. Reconstructed spectrograms achieve Pearson correlation coefficients with a mean of r=0.5 +/- 0.11 in a 5-fold cross validation.
Original languageEnglish
Title of host publication21st Annual Conference of the International Speech Communication Association
Subtitle of host publicationINTERSPEECH 2020
Place of PublicationShanghai
PublisherISCA International Speech Communication Association
Pages2777-2781
Number of pages5
Volume2020-October
DOIs
Publication statusPublished - 2020
EventInterspeech Conference - Virtual, China
Duration: 25 Oct 202029 Oct 2020
http://www.interspeech2020.org/

Publication series

SeriesInterspeech
ISSN2308-457X

Conference

ConferenceInterspeech Conference
Country/TerritoryChina
Period25/10/2029/10/20
Internet address

Keywords

  • neural signals for spoken communication
  • speech synthesis
  • electrocorticography
  • BCI
  • COMPUTER-INTERFACE
  • GAMMA ACTIVITY
  • SPOKEN
  • COMMUNICATION

Fingerprint

Dive into the research topics of 'Speech Spectrogram Estimation from Intracranial Brain Activity using a Quantization Approach'. Together they form a unique fingerprint.

Cite this