Optimizing segmentation granularity for neural machine translation

Elizabeth Salesky; Andrew Runge; Alex Coda; Jan Niehues; Graham Neubig

doi:10.1007/s10590-019-09243-8

Optimizing segmentation granularity for neural machine translation

Elizabeth Salesky^*, Andrew Runge, Alex Coda, Jan Niehues, Graham Neubig

^*Corresponding author for this work

Advanced Computing Sciences

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

In neural machine translation (NMT), it has become standard to translate using subword units to allow for an open vocabulary and improve accuracy on infrequent words. Byte-pair encoding (BPE) and its variants are the predominant approach to generating these subwords, as they are unsupervised, resource-free, and empirically effective. However, the granularity of these subword units is a hyperparameter to be tuned for each language and task, using methods such as grid search. Tuning may be done inexhaustively or skipped entirely due to resource constraints, leading to sub-optimal performance. In this paper, we propose a method to automatically tune this parameter using only one training pass. We incrementally introduce new BPE vocabulary online based on the held-out validation loss, beginning with smaller, general subwords and adding larger, more specific units over the course of training. Our method matches the results found with grid search, optimizing segmentation granularity while significantly reducing overall training time. We also show benefits in training efficiency and performance improvements for rare words due to the way embeddings for larger units are incrementally constructed by combining those from smaller units.

Original language	English
Pages (from-to)	41-59
Number of pages	19
Journal	Machine Translation
Volume	34
Issue number	1
Early online date	24 Jan 2020
DOIs	https://doi.org/10.1007/s10590-019-09243-8
Publication status	Published - Apr 2020

Keywords

Neural machine translation
Subword units
Byte-pair encoding
Online optimization
Segmentation

Access to Document

10.1007/s10590-019-09243-8

Cite this

@article{d747de35c0284c2280cadbb4726990d0,

title = "Optimizing segmentation granularity for neural machine translation",

abstract = "In neural machine translation (NMT), it has become standard to translate using subword units to allow for an open vocabulary and improve accuracy on infrequent words. Byte-pair encoding (BPE) and its variants are the predominant approach to generating these subwords, as they are unsupervised, resource-free, and empirically effective. However, the granularity of these subword units is a hyperparameter to be tuned for each language and task, using methods such as grid search. Tuning may be done inexhaustively or skipped entirely due to resource constraints, leading to sub-optimal performance. In this paper, we propose a method to automatically tune this parameter using only one training pass. We incrementally introduce new BPE vocabulary online based on the held-out validation loss, beginning with smaller, general subwords and adding larger, more specific units over the course of training. Our method matches the results found with grid search, optimizing segmentation granularity while significantly reducing overall training time. We also show benefits in training efficiency and performance improvements for rare words due to the way embeddings for larger units are incrementally constructed by combining those from smaller units.",

keywords = "Neural machine translation, Subword units, Byte-pair encoding, Online optimization, Segmentation",

author = "Elizabeth Salesky and Andrew Runge and Alex Coda and Jan Niehues and Graham Neubig",

note = "Publisher Copyright: {\textcopyright} 2020, Springer Nature B.V.",

year = "2020",

month = apr,

doi = "10.1007/s10590-019-09243-8",

language = "English",

volume = "34",

pages = "41--59",

journal = "Machine Translation",

issn = "0922-6567",

publisher = "Springer",

number = "1",

}

TY - JOUR

T1 - Optimizing segmentation granularity for neural machine translation

AU - Salesky, Elizabeth

AU - Runge, Andrew

AU - Coda, Alex

AU - Niehues, Jan

AU - Neubig, Graham

PY - 2020/4

Y1 - 2020/4

N2 - In neural machine translation (NMT), it has become standard to translate using subword units to allow for an open vocabulary and improve accuracy on infrequent words. Byte-pair encoding (BPE) and its variants are the predominant approach to generating these subwords, as they are unsupervised, resource-free, and empirically effective. However, the granularity of these subword units is a hyperparameter to be tuned for each language and task, using methods such as grid search. Tuning may be done inexhaustively or skipped entirely due to resource constraints, leading to sub-optimal performance. In this paper, we propose a method to automatically tune this parameter using only one training pass. We incrementally introduce new BPE vocabulary online based on the held-out validation loss, beginning with smaller, general subwords and adding larger, more specific units over the course of training. Our method matches the results found with grid search, optimizing segmentation granularity while significantly reducing overall training time. We also show benefits in training efficiency and performance improvements for rare words due to the way embeddings for larger units are incrementally constructed by combining those from smaller units.

AB - In neural machine translation (NMT), it has become standard to translate using subword units to allow for an open vocabulary and improve accuracy on infrequent words. Byte-pair encoding (BPE) and its variants are the predominant approach to generating these subwords, as they are unsupervised, resource-free, and empirically effective. However, the granularity of these subword units is a hyperparameter to be tuned for each language and task, using methods such as grid search. Tuning may be done inexhaustively or skipped entirely due to resource constraints, leading to sub-optimal performance. In this paper, we propose a method to automatically tune this parameter using only one training pass. We incrementally introduce new BPE vocabulary online based on the held-out validation loss, beginning with smaller, general subwords and adding larger, more specific units over the course of training. Our method matches the results found with grid search, optimizing segmentation granularity while significantly reducing overall training time. We also show benefits in training efficiency and performance improvements for rare words due to the way embeddings for larger units are incrementally constructed by combining those from smaller units.

KW - Neural machine translation

KW - Subword units

KW - Byte-pair encoding

KW - Online optimization

KW - Segmentation

U2 - 10.1007/s10590-019-09243-8

DO - 10.1007/s10590-019-09243-8

M3 - Article

SN - 0922-6567

VL - 34

SP - 41

EP - 59

JO - Machine Translation

JF - Machine Translation

IS - 1

ER -