Unsupervised Machine Translation On Dravidian Languages

Sai  Koneru; Danni Liu; Jan Niehues

Unsupervised Machine Translation On Dravidian Languages

Sai Koneru^*, Danni Liu, Jan Niehues

^*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceeding › Conference article in proceeding › Academic › peer-review

Abstract

"Unsupervised Neural Machine translation (UNMT) is beneficial especially for under-resourced languages such as from the Dravidian family. They learn to translate between the source and target, relying solely on only monolingual corpora. However, UNMT systems fail in scenarios that occur often when dealing with low resource languages. Recent works have achieved state-of-the-art results by adding auxiliary parallel data with similar languages. In this work, we focus on unsupervised translation between English and Kannada by using limited amounts of auxiliary data between English and other Dravidian languages. We show that transliteration is essential in unsupervised translation between Dravidian languages, as they do not share a common writing system. We explore several model architectures that use the auxiliary data in order to maximize knowledge sharing and enable UNMT for dissimilar language pairs. We show from our experiments it is crucial for Kannada and reference languages to be similar. Further, we propose a method to measure language similarity to choose the most beneficial reference languages.

Original language	English
Title of host publication	Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages
Publisher	Association for Computational Linguistics
Pages	55-64
Number of pages	10
Publication status	Published - 20 Apr 2021
Event	The First Workshop on Speech and Language Technologies for Dravidian Languages - Online, EACL, Unknown Duration: 20 Apr 2021 → 20 Apr 2021 https://dravidianlangtech.github.io/2021/index.html

Workshop

Workshop	The First Workshop on Speech and Language Technologies for Dravidian Languages
Abbreviated title	DravidianLangTech-2021
Country/Territory	Unknown
Period	20/04/21 → 20/04/21
Internet address	https://dravidianlangtech.github.io/2021/index.html

Access to Document

https://aclanthology.org/2021.dravidianlangtech-1.7.pdf

Cite this

@inproceedings{8593151edfd44386999f237840f6f854,

title = "Unsupervised Machine Translation On Dravidian Languages",

abstract = "{"}Unsupervised Neural Machine translation (UNMT) is beneficial especially for under-resourced languages such as from the Dravidian family. They learn to translate between the source and target, relying solely on only monolingual corpora. However, UNMT systems fail in scenarios that occur often when dealing with low resource languages. Recent works have achieved state-of-the-art results by adding auxiliary parallel data with similar languages. In this work, we focus on unsupervised translation between English and Kannada by using limited amounts of auxiliary data between English and other Dravidian languages. We show that transliteration is essential in unsupervised translation between Dravidian languages, as they do not share a common writing system. We explore several model architectures that use the auxiliary data in order to maximize knowledge sharing and enable UNMT for dissimilar language pairs. We show from our experiments it is crucial for Kannada and reference languages to be similar. Further, we propose a method to measure language similarity to choose the most beneficial reference languages.",

author = "Sai Koneru and Danni Liu and Jan Niehues",

note = "Funding Information: Acknowledgement This work is supported by the Facebook Sponsored Research Agreement “Language Similarity in Machine Translation”. Publisher Copyright: {\textcopyright}2021 Association for Computational Linguistics; The First Workshop on Speech and Language Technologies for Dravidian Languages, DravidianLangTech-2021 ; Conference date: 20-04-2021 Through 20-04-2021",

year = "2021",

month = apr,

day = "20",

language = "English",

pages = "55--64",

booktitle = "Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages",

publisher = "Association for Computational Linguistics",

url = "https://dravidianlangtech.github.io/2021/index.html",

}

Koneru, S, Liu, D & Niehues, J 2021, Unsupervised Machine Translation On Dravidian Languages. in Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages. Association for Computational Linguistics, pp. 55-64, The First Workshop on Speech and Language Technologies for Dravidian Languages, Unknown, 20/04/21. <https://aclanthology.org/2021.dravidianlangtech-1.7.pdf>

Unsupervised Machine Translation On Dravidian Languages. / Koneru, Sai ; Liu, Danni; Niehues, Jan.
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages. Association for Computational Linguistics, 2021. p. 55-64.

Research output: Chapter in Book/Report/Conference proceeding › Conference article in proceeding › Academic › peer-review

TY - GEN

T1 - Unsupervised Machine Translation On Dravidian Languages

AU - Koneru, Sai

AU - Liu, Danni

AU - Niehues, Jan

N1 - Funding Information: Acknowledgement This work is supported by the Facebook Sponsored Research Agreement “Language Similarity in Machine Translation”. Publisher Copyright: ©2021 Association for Computational Linguistics

PY - 2021/4/20

Y1 - 2021/4/20

N2 - "Unsupervised Neural Machine translation (UNMT) is beneficial especially for under-resourced languages such as from the Dravidian family. They learn to translate between the source and target, relying solely on only monolingual corpora. However, UNMT systems fail in scenarios that occur often when dealing with low resource languages. Recent works have achieved state-of-the-art results by adding auxiliary parallel data with similar languages. In this work, we focus on unsupervised translation between English and Kannada by using limited amounts of auxiliary data between English and other Dravidian languages. We show that transliteration is essential in unsupervised translation between Dravidian languages, as they do not share a common writing system. We explore several model architectures that use the auxiliary data in order to maximize knowledge sharing and enable UNMT for dissimilar language pairs. We show from our experiments it is crucial for Kannada and reference languages to be similar. Further, we propose a method to measure language similarity to choose the most beneficial reference languages.

AB - "Unsupervised Neural Machine translation (UNMT) is beneficial especially for under-resourced languages such as from the Dravidian family. They learn to translate between the source and target, relying solely on only monolingual corpora. However, UNMT systems fail in scenarios that occur often when dealing with low resource languages. Recent works have achieved state-of-the-art results by adding auxiliary parallel data with similar languages. In this work, we focus on unsupervised translation between English and Kannada by using limited amounts of auxiliary data between English and other Dravidian languages. We show that transliteration is essential in unsupervised translation between Dravidian languages, as they do not share a common writing system. We explore several model architectures that use the auxiliary data in order to maximize knowledge sharing and enable UNMT for dissimilar language pairs. We show from our experiments it is crucial for Kannada and reference languages to be similar. Further, we propose a method to measure language similarity to choose the most beneficial reference languages.

M3 - Conference article in proceeding

SP - 55

EP - 64

BT - Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages

PB - Association for Computational Linguistics

T2 - The First Workshop on Speech and Language Technologies for Dravidian Languages

Y2 - 20 April 2021 through 20 April 2021

ER -