Effective combination of pretrained models - KIT@IWSLT2022

Ngoc-Quan Pham; Tuan Nam Nguyen; Thai-Binh  Nguyen; Danni Liu; Carlos Mullov; Jan Niehues; Alexander Waibel

doi:10.18653/v1/2022.iwslt-1.14

Effective combination of pretrained models - KIT@IWSLT2022

Ngoc-Quan Pham^*, Tuan Nam Nguyen, Thai-Binh Nguyen, Danni Liu, Carlos Mullov, Jan Niehues, Alexander Waibel

^*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceeding › Conference article in proceeding › Academic › peer-review

Abstract

Pretrained models in acoustic and textual modalities can potentially improve speech translation for both Cascade and End-to-end approaches. In this evaluation, we aim at empirically looking for the answer by using the wav2vec, mBART50 and DeltaLM models to improve text and speech translation models. The experiments showed that the presence of these models together with an advanced audio segmentation method results in an improvement over the previous end-to-end system by up to 7 BLEU points. More importantly, the experiments showed that given enough data and modeling capacity to overcome the training difficulty, we can outperform even very competitive Cascade systems. In our experiments, this gap can be as large as 2.0 BLEU points, the same gap that the Cascade often led over the years.

Original language	English
Title of host publication	Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)
Publisher	Association for Computational Linguistics
Pages	190-197
Number of pages	8
ISBN (Print)	9781955917414
DOIs	https://doi.org/10.18653/v1/2022.iwslt-1.14
Publication status	Published - 2022
Event	The International Conference on Spoken Language Translation - Dublin, Ireland Duration: 26 May 2022 → 27 May 2022 https://iwslt.org/2022/

Conference

Conference	The International Conference on Spoken Language Translation
Abbreviated title	19th IWSLT
Country/Territory	Ireland
City	Dublin
Period	26/05/22 → 27/05/22
Internet address	https://iwslt.org/2022/

Access to Document

10.18653/v1/2022.iwslt-1.14Licence: CC BY

https://aclanthology.org/2022.iwslt-1.14.pdf

Cite this

@inproceedings{77c7c1c822e34c54937f4b5c4ed52a58,

title = "Effective combination of pretrained models - KIT@IWSLT2022",

abstract = "Pretrained models in acoustic and textual modalities can potentially improve speech translation for both Cascade and End-to-end approaches. In this evaluation, we aim at empirically looking for the answer by using the wav2vec, mBART50 and DeltaLM models to improve text and speech translation models. The experiments showed that the presence of these models together with an advanced audio segmentation method results in an improvement over the previous end-to-end system by up to 7 BLEU points. More importantly, the experiments showed that given enough data and modeling capacity to overcome the training difficulty, we can outperform even very competitive Cascade systems. In our experiments, this gap can be as large as 2.0 BLEU points, the same gap that the Cascade often led over the years.",

author = "Ngoc-Quan Pham and Nguyen, {Tuan Nam} and Thai-Binh Nguyen and Danni Liu and Carlos Mullov and Jan Niehues and Alexander Waibel",

note = "Funding Information: The projects on which this paper is based were funded by the Federal Ministry of Education and Research (BMBF) of Germany under the numbers 01IS18040A. The authors are responsible for the content of this publication. Publisher Copyright: {\textcopyright} 2022 Association for Computational Linguistics.; The International Conference on Spoken Language Translation, 19th IWSLT ; Conference date: 26-05-2022 Through 27-05-2022",

year = "2022",

doi = "10.18653/v1/2022.iwslt-1.14",

language = "English",

isbn = "9781955917414",

pages = "190--197",

booktitle = "Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)",

publisher = "Association for Computational Linguistics",

url = "https://iwslt.org/2022/",

}

Pham, N-Q, Nguyen, TN, Nguyen, T-B, Liu, D, Mullov, C, Niehues, J & Waibel, A 2022, Effective combination of pretrained models - KIT@IWSLT2022. in Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022). Association for Computational Linguistics, pp. 190-197, The International Conference on Spoken Language Translation, Dublin, Ireland, 26/05/22. https://doi.org/10.18653/v1/2022.iwslt-1.14

Effective combination of pretrained models - KIT@IWSLT2022. / Pham, Ngoc-Quan; Nguyen, Tuan Nam; Nguyen, Thai-Binh et al.
Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022). Association for Computational Linguistics, 2022. p. 190-197.

Research output: Chapter in Book/Report/Conference proceeding › Conference article in proceeding › Academic › peer-review

TY - GEN

T1 - Effective combination of pretrained models - KIT@IWSLT2022

AU - Pham, Ngoc-Quan

AU - Nguyen, Tuan Nam

AU - Nguyen, Thai-Binh

AU - Liu, Danni

AU - Mullov, Carlos

AU - Niehues, Jan

AU - Waibel, Alexander

N1 - Funding Information: The projects on which this paper is based were funded by the Federal Ministry of Education and Research (BMBF) of Germany under the numbers 01IS18040A. The authors are responsible for the content of this publication. Publisher Copyright: © 2022 Association for Computational Linguistics.

PY - 2022

Y1 - 2022

N2 - Pretrained models in acoustic and textual modalities can potentially improve speech translation for both Cascade and End-to-end approaches. In this evaluation, we aim at empirically looking for the answer by using the wav2vec, mBART50 and DeltaLM models to improve text and speech translation models. The experiments showed that the presence of these models together with an advanced audio segmentation method results in an improvement over the previous end-to-end system by up to 7 BLEU points. More importantly, the experiments showed that given enough data and modeling capacity to overcome the training difficulty, we can outperform even very competitive Cascade systems. In our experiments, this gap can be as large as 2.0 BLEU points, the same gap that the Cascade often led over the years.

AB - Pretrained models in acoustic and textual modalities can potentially improve speech translation for both Cascade and End-to-end approaches. In this evaluation, we aim at empirically looking for the answer by using the wav2vec, mBART50 and DeltaLM models to improve text and speech translation models. The experiments showed that the presence of these models together with an advanced audio segmentation method results in an improvement over the previous end-to-end system by up to 7 BLEU points. More importantly, the experiments showed that given enough data and modeling capacity to overcome the training difficulty, we can outperform even very competitive Cascade systems. In our experiments, this gap can be as large as 2.0 BLEU points, the same gap that the Cascade often led over the years.

U2 - 10.18653/v1/2022.iwslt-1.14

DO - 10.18653/v1/2022.iwslt-1.14

M3 - Conference article in proceeding

SN - 9781955917414

SP - 190

EP - 197

BT - Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)

PB - Association for Computational Linguistics

T2 - The International Conference on Spoken Language Translation

Y2 - 26 May 2022 through 27 May 2022

ER -