Unsupervised Machine Translation On Dravidian Languages

Sai Koneru*, Danni Liu, Jan Niehues

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingAcademicpeer-review

Abstract

"Unsupervised Neural Machine translation (UNMT) is beneficial especially for under-resourced languages such as from the Dravidian family. They learn to translate between the source and target, relying solely on only monolingual corpora. However, UNMT systems fail in scenarios that occur often when dealing with low resource languages. Recent works have achieved state-of-the-art results by adding auxiliary parallel data with similar languages. In this work, we focus on unsupervised translation between English and Kannada by using limited amounts of auxiliary data between English and other Dravidian languages. We show that transliteration is essential in unsupervised translation between Dravidian languages, as they do not share a common writing system. We explore several model architectures that use the auxiliary data in order to maximize knowledge sharing and enable UNMT for dissimilar language pairs. We show from our experiments it is crucial for Kannada and reference languages to be similar. Further, we propose a method to measure language similarity to choose the most beneficial reference languages.
Original languageEnglish
Title of host publicationProceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages
PublisherAssociation for Computational Linguistics
Pages55-64
Number of pages10
Publication statusPublished - 20 Apr 2021
EventThe First Workshop on Speech and Language Technologies for Dravidian Languages - Online, EACL, Unknown
Duration: 20 Apr 202120 Apr 2021
https://dravidianlangtech.github.io/2021/index.html

Workshop

WorkshopThe First Workshop on Speech and Language Technologies for Dravidian Languages
Abbreviated titleDravidianLangTech-2021
Country/TerritoryUnknown
Period20/04/2120/04/21
Internet address

Cite this