Improving Sequence-To-Sequence Speech Recognition Training with On-The-Fly Data Augmentation

Thai-Son Nguyen*, Sebastian Stüker, Jan Niehues, Alex Waibel

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingAcademicpeer-review

Abstract

Sequence-to-Sequence (S2S) models recently started to show state-of-the-art performance for automatic speech recognition (ASR). With these large and deep models overfitting remains the largest problem, outweighing performance improvements that can be obtained from better architectures. One solution to the overfitting problem is increasing the amount of available training data and the variety exhibited by the training data with the help of data augmentation. In this paper we examine the influence of three data augmentation methods on the performance of two S2S model architectures. One of the data augmentation method comes from literature, while two other methods are our own development - a time perturbation in the frequency domain and sub-sequence sampling. Our experiments on Switchboard and Fisher data show state-of-the-art performance for S2S models that are trained solely on the speech training data and do not use additional text data.

Original languageEnglish
Title of host publicationICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
PublisherIEEE
Pages7689-7693
Number of pages5
ISBN (Electronic)978-1-5090-6631-5
ISBN (Print)978-1-5090-6632-2
DOIs
Publication statusPublished - 2020
Event45th International Conference on Acoustics, Speech, and Signal Processing - Online, Barcelona, Spain
Duration: 4 May 20208 May 2020
Conference number: 45
https://2020.ieeeicassp.org/

Conference

Conference45th International Conference on Acoustics, Speech, and Signal Processing
Abbreviated titleICASSP 2020
Country/TerritorySpain
CityBarcelona
Period4/05/208/05/20
Internet address

Keywords

  • sequence-to-sequence
  • self-attention
  • data augmentation
  • speed perturbation
  • sub-sequence
  • Data Augmentation
  • Self-attention
  • Sequence-to-sequence
  • Speed Perturbation
  • Sub-sequence

Fingerprint

Dive into the research topics of 'Improving Sequence-To-Sequence Speech Recognition Training with On-The-Fly Data Augmentation'. Together they form a unique fingerprint.

Cite this