Data Augmentation for Low-Resource Neural Machine Translation

Abstract

The quality of a Neural Machine Translation system depends substantially on the availability of sizable parallel corpora. For low-resource language pairs this is not the case, resulting in poor translation quality. Inspired by work in computer vision, we propose a novel data augmentation approach that targets low-frequency words by generating new sentence pairs containing rare words in new, synthetically created contexts. Experimental results on simulated low-resource settings show that our method improves translation quality by up to 2.9 BLEU points over the baseline and up to 3.2 BLEU over back-translation. –(Fadaee, Bisazza, and Monz 2017)

Outline

The Paper

paper

References

Fadaee, Marzieh, Arianna Bisazza, and Christof Monz. 2017. “Data Augmentation for Low-Resource Neural Machine Translation.” In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), edited by Regina Barzilay and Min-Yen Kan, 567–73. Vancouver, Canada: Association for Computational Linguistics. https://doi.org/10.18653/v1/P17-2090.

Citation

BibTeX citation:

@online{bochman2021,
  author = {Bochman, Oren},
  title = {Data {Augmentation} for {Low-Resource} {Neural} {Machine}
    {Translation}},
  date = {2021-05-14},
  url = {https://orenbochman.github.io/notes-nlp/reviews/paper/2017-data-augmentation-low-resource-NMT/},
  langid = {en}
}

For attribution, please cite this work as:

Bochman, Oren. 2021. “Data Augmentation for Low-Resource Neural Machine Translation.” May 14, 2021. https://orenbochman.github.io/notes-nlp/reviews/paper/2017-data-augmentation-low-resource-NMT/.