Tackling Data Scarcity In Speech Translation Using Zero-Shot Multilingual Machine Translation Techniques

Tu Anh Dinh; Danni Liu; Jan Niehues

doi:10.1109/ICASSP43922.2022.9746815

Tackling Data Scarcity In Speech Translation Using Zero-Shot Multilingual Machine Translation Techniques

Tu Anh Dinh^*, Danni Liu, Jan Niehues

^*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceeding › Conference article in proceeding › Academic › peer-review

Abstract

Recently, end-to-end speech translation (ST) has gained significant attention as it avoids error propagation. However, the approach suffers from data scarcity. It heavily depends on direct ST data and is less efficient in making use of speech transcription and text translation data, which is often more easily available. In the related field of multilingual text translation, several techniques have been proposed for zero-shot translation. A main idea is to increase the similarity of semantically similar sentences in different languages. We investigate whether these ideas can be applied to speech translation, by building ST models trained on speech transcription and text translation data. We investigate the effects of data augmentation and auxiliary loss function. The techniques were successfully applied to few-shot ST using limited ST data, with improvements of up to +12.9 BLEU points compared to direct end-to-end ST and +3.1 BLEU points compared to ST models fine-tuned from ASR model.

Original language	English
Title of host publication	ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Publisher	IEEE
Pages	6222-6226
Number of pages	5
ISBN (Print)	9781665405409
DOIs	https://doi.org/10.1109/ICASSP43922.2022.9746815
Publication status	Published - 2022
Event	47th IEEE International Conference on Acoustics, Speech and Signal Processing - Online, Singapore, Singapore Duration: 22 May 2022 → 27 May 2022 Conference number: 47 https://2022.ieeeicassp.org/

Publication series

Series	International Conference on Acoustics Speech and Signal Processing Proceedings
ISSN	1520-6149

Conference

Conference	47th IEEE International Conference on Acoustics, Speech and Signal Processing
Abbreviated title	ICASSP 2022
Country/Territory	Singapore
City	Singapore
Period	22/05/22 → 27/05/22
Internet address	https://2022.ieeeicassp.org/

Keywords

speech translation
zero-shot
few-shot
machine translation
multi-task

Access to Document

10.1109/ICASSP43922.2022.9746815

https://ieeexplore.ieee.org/document/9746815

Cite this

@inproceedings{f48dd3b7f002488cb7ae128728faeee9,

title = "Tackling Data Scarcity In Speech Translation Using Zero-Shot Multilingual Machine Translation Techniques",

abstract = "Recently, end-to-end speech translation (ST) has gained significant attention as it avoids error propagation. However, the approach suffers from data scarcity. It heavily depends on direct ST data and is less efficient in making use of speech transcription and text translation data, which is often more easily available. In the related field of multilingual text translation, several techniques have been proposed for zero-shot translation. A main idea is to increase the similarity of semantically similar sentences in different languages. We investigate whether these ideas can be applied to speech translation, by building ST models trained on speech transcription and text translation data. We investigate the effects of data augmentation and auxiliary loss function. The techniques were successfully applied to few-shot ST using limited ST data, with improvements of up to +12.9 BLEU points compared to direct end-to-end ST and +3.1 BLEU points compared to ST models fine-tuned from ASR model.",

keywords = "speech translation, zero-shot, few-shot, machine translation, multi-task",

author = "Dinh, {Tu Anh} and Danni Liu and Jan Niehues",

year = "2022",

doi = "10.1109/ICASSP43922.2022.9746815",

language = "English",

isbn = "9781665405409",

series = "International Conference on Acoustics Speech and Signal Processing Proceedings",

publisher = "IEEE",

pages = "6222--6226",

booktitle = "ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)",

address = "United States",

note = "47th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022 ; Conference date: 22-05-2022 Through 27-05-2022",

url = "https://2022.ieeeicassp.org/",

}

Dinh, TA, Liu, D & Niehues, J 2022, Tackling Data Scarcity In Speech Translation Using Zero-Shot Multilingual Machine Translation Techniques. in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, International Conference on Acoustics Speech and Signal Processing Proceedings, pp. 6222-6226, 47th IEEE International Conference on Acoustics, Speech and Signal Processing, Singapore, Singapore, 22/05/22. https://doi.org/10.1109/ICASSP43922.2022.9746815

Tackling Data Scarcity In Speech Translation Using Zero-Shot Multilingual Machine Translation Techniques. / Dinh, Tu Anh; Liu, Danni; Niehues, Jan.
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022. p. 6222-6226 (International Conference on Acoustics Speech and Signal Processing Proceedings).

Research output: Chapter in Book/Report/Conference proceeding › Conference article in proceeding › Academic › peer-review

TY - GEN

T1 - Tackling Data Scarcity In Speech Translation Using Zero-Shot Multilingual Machine Translation Techniques

AU - Dinh, Tu Anh

AU - Liu, Danni

AU - Niehues, Jan

N1 - Conference code: 47

PY - 2022

Y1 - 2022

N2 - Recently, end-to-end speech translation (ST) has gained significant attention as it avoids error propagation. However, the approach suffers from data scarcity. It heavily depends on direct ST data and is less efficient in making use of speech transcription and text translation data, which is often more easily available. In the related field of multilingual text translation, several techniques have been proposed for zero-shot translation. A main idea is to increase the similarity of semantically similar sentences in different languages. We investigate whether these ideas can be applied to speech translation, by building ST models trained on speech transcription and text translation data. We investigate the effects of data augmentation and auxiliary loss function. The techniques were successfully applied to few-shot ST using limited ST data, with improvements of up to +12.9 BLEU points compared to direct end-to-end ST and +3.1 BLEU points compared to ST models fine-tuned from ASR model.

AB - Recently, end-to-end speech translation (ST) has gained significant attention as it avoids error propagation. However, the approach suffers from data scarcity. It heavily depends on direct ST data and is less efficient in making use of speech transcription and text translation data, which is often more easily available. In the related field of multilingual text translation, several techniques have been proposed for zero-shot translation. A main idea is to increase the similarity of semantically similar sentences in different languages. We investigate whether these ideas can be applied to speech translation, by building ST models trained on speech transcription and text translation data. We investigate the effects of data augmentation and auxiliary loss function. The techniques were successfully applied to few-shot ST using limited ST data, with improvements of up to +12.9 BLEU points compared to direct end-to-end ST and +3.1 BLEU points compared to ST models fine-tuned from ASR model.

KW - speech translation

KW - zero-shot

KW - few-shot

KW - machine translation

KW - multi-task

UR - https://arxiv.org/abs/2201.11172

U2 - 10.1109/ICASSP43922.2022.9746815

DO - 10.1109/ICASSP43922.2022.9746815

M3 - Conference article in proceeding

SN - 9781665405409

T3 - International Conference on Acoustics Speech and Signal Processing Proceedings

SP - 6222

EP - 6226

BT - ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

PB - IEEE

T2 - 47th IEEE International Conference on Acoustics, Speech and Signal Processing

Y2 - 22 May 2022 through 27 May 2022

ER -

Dinh TA, Liu D, Niehues J. Tackling Data Scarcity In Speech Translation Using Zero-Shot Multilingual Machine Translation Techniques. In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. 2022. p. 6222-6226. (International Conference on Acoustics Speech and Signal Processing Proceedings). doi: 10.1109/ICASSP43922.2022.9746815

Tackling Data Scarcity In Speech Translation Using Zero-Shot Multilingual Machine Translation Techniques

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Cite this