Abstract
Breathing is the primary mechanism for maintaining the subglottal pressure for speech production. Speech can be seen as a systematic outflow of air during exhalation characterized by linguistic content and prosodic factors. Thus, sensing respiratory dynamics from the speech is plausible. In this paper, we explore techniques for sensing breathing from speech using deep learning architectures including multi-task learning approaches. Estimating the breathing pattern from the speech would give us information about the respiration rate, breathing capacity and thus enable us to understand the pathological condition of a person using one's speech. Training and evaluation of our model on our database of breathing signal and speech for 40 subjects yielded a sensitivity of 0.88 for breath event detection and 5.6 % error for breathing rate estimation.
Original language | English |
---|---|
Title of host publication | 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings |
Publisher | IEEE |
Pages | 1140-1144 |
Number of pages | 5 |
ISBN (Electronic) | 9781509066315 |
DOIs | |
Publication status | Published - May 2020 |
Externally published | Yes |
Event | 45th International Conference on Acoustics, Speech, and Signal Processing - Online, Barcelona, Spain Duration: 4 May 2020 → 8 May 2020 Conference number: 45 https://2020.ieeeicassp.org/ |
Publication series
Series | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
---|---|
Volume | 2020-May |
ISSN | 1520-6149 |
Conference
Conference | 45th International Conference on Acoustics, Speech, and Signal Processing |
---|---|
Abbreviated title | ICASSP 2020 |
Country/Territory | Spain |
City | Barcelona |
Period | 4/05/20 → 8/05/20 |
Internet address |
Keywords
- deep neural networks
- Multi task learning
- signal processing
- Speech breathing
- speech technology