Improving Zero-shot Translation with Language-Independent Constraints

Ngoc-Quan Pham; Jan Niehues; Thanh-Le Ha; Alex Waibel

doi:10.18653/V1/W19-5202

Improving Zero-shot Translation with Language-Independent Constraints

Ngoc-Quan Pham^*, Jan Niehues, Thanh-Le Ha, Alex Waibel

^*Corresponding author for this work

Advanced Computing Sciences

Research output: Chapter in Book/Report/Conference proceeding › Conference article in proceeding › Academic › peer-review

Abstract

An important concern in training multilingual neural machine translation (NMT) is to translate between language pairs unseen during training, i.e zero-shot translation. Improving this ability kills two birds with one stone by providing an alternative to pivot translation which also allows us to better understand how the model captures information between languages.

In this work, we carried out an investigation on this capability of the multilingual NMT models. First, we intentionally create an encoder architecture which is independent with respect to the source language. Such experiments shed light on the ability of NMT encoders to learn multilingual representations, in general. Based on such proof of concept, we were able to design regularization methods into the standard Transformer model, so that the whole architecture becomes more robust in zero-shot conditions. We investigated the behaviour of such models on the standard IWSLT 2017 multilingual dataset. We achieved an average improvement of 2.23 BLEU points across 12 language pairs compared to the zero-shot performance of a state-of-the-art multilingual system. Additionally, we carry out further experiments in which the effect is confirmed even for language pairs with multiple intermediate pivots.

Original language	English
Title of host publication	FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), VOL 1: RESEARCH PAPERS
Pages	13-23
Number of pages	11
DOIs	https://doi.org/10.18653/V1/W19-5202
Publication status	Published - 2019

Access to Document

10.18653/V1/W19-5202Licence: CC BY

http://arxiv.org/abs/1906.08584

Cite this

@inproceedings{eb104b65465748a2b678ac155eab4944,

title = "Improving Zero-shot Translation with Language-Independent Constraints",

abstract = "An important concern in training multilingual neural machine translation (NMT) is to translate between language pairs unseen during training, i.e zero-shot translation. Improving this ability kills two birds with one stone by providing an alternative to pivot translation which also allows us to better understand how the model captures information between languages.In this work, we carried out an investigation on this capability of the multilingual NMT models. First, we intentionally create an encoder architecture which is independent with respect to the source language. Such experiments shed light on the ability of NMT encoders to learn multilingual representations, in general. Based on such proof of concept, we were able to design regularization methods into the standard Transformer model, so that the whole architecture becomes more robust in zero-shot conditions. We investigated the behaviour of such models on the standard IWSLT 2017 multilingual dataset. We achieved an average improvement of 2.23 BLEU points across 12 language pairs compared to the zero-shot performance of a state-of-the-art multilingual system. Additionally, we carry out further experiments in which the effect is confirmed even for language pairs with multiple intermediate pivots.",

author = "Ngoc-Quan Pham and Jan Niehues and Thanh-Le Ha and Alex Waibel",

note = "Funding Information: The project ELITR leading to this publication has received funding from the European Unions Horizon 2020 Research and Innovation Programme under grant agreement No 825460. We thank Elizabeth Salesky for the constructive comments. Publisher Copyright: {\textcopyright} 2019 Association for Computational Linguistics",

year = "2019",

doi = "10.18653/V1/W19-5202",

language = "English",

pages = "13--23",

booktitle = "FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), VOL 1: RESEARCH PAPERS",

}

TY - GEN

T1 - Improving Zero-shot Translation with Language-Independent Constraints

AU - Pham, Ngoc-Quan

AU - Niehues, Jan

AU - Ha, Thanh-Le

AU - Waibel, Alex

N1 - Funding Information: The project ELITR leading to this publication has received funding from the European Unions Horizon 2020 Research and Innovation Programme under grant agreement No 825460. We thank Elizabeth Salesky for the constructive comments. Publisher Copyright: © 2019 Association for Computational Linguistics

PY - 2019

Y1 - 2019

N2 - An important concern in training multilingual neural machine translation (NMT) is to translate between language pairs unseen during training, i.e zero-shot translation. Improving this ability kills two birds with one stone by providing an alternative to pivot translation which also allows us to better understand how the model captures information between languages.In this work, we carried out an investigation on this capability of the multilingual NMT models. First, we intentionally create an encoder architecture which is independent with respect to the source language. Such experiments shed light on the ability of NMT encoders to learn multilingual representations, in general. Based on such proof of concept, we were able to design regularization methods into the standard Transformer model, so that the whole architecture becomes more robust in zero-shot conditions. We investigated the behaviour of such models on the standard IWSLT 2017 multilingual dataset. We achieved an average improvement of 2.23 BLEU points across 12 language pairs compared to the zero-shot performance of a state-of-the-art multilingual system. Additionally, we carry out further experiments in which the effect is confirmed even for language pairs with multiple intermediate pivots.

AB - An important concern in training multilingual neural machine translation (NMT) is to translate between language pairs unseen during training, i.e zero-shot translation. Improving this ability kills two birds with one stone by providing an alternative to pivot translation which also allows us to better understand how the model captures information between languages.In this work, we carried out an investigation on this capability of the multilingual NMT models. First, we intentionally create an encoder architecture which is independent with respect to the source language. Such experiments shed light on the ability of NMT encoders to learn multilingual representations, in general. Based on such proof of concept, we were able to design regularization methods into the standard Transformer model, so that the whole architecture becomes more robust in zero-shot conditions. We investigated the behaviour of such models on the standard IWSLT 2017 multilingual dataset. We achieved an average improvement of 2.23 BLEU points across 12 language pairs compared to the zero-shot performance of a state-of-the-art multilingual system. Additionally, we carry out further experiments in which the effect is confirmed even for language pairs with multiple intermediate pivots.

U2 - 10.18653/V1/W19-5202

DO - 10.18653/V1/W19-5202

M3 - Conference article in proceeding

SP - 13

EP - 23

BT - FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), VOL 1: RESEARCH PAPERS

ER -