[HTML][HTML] Neural machine translation: A review of methods, resources, and tools

Z Tan, S Wang, Z Yang, G Chen, X Huang, M Sun… - AI Open, 2020 - Elsevier
Abstract Machine translation (MT) is an important sub-field of natural language processing
that aims to translate natural languages using computers. In recent years, end-to-end neural …

Are sixteen heads really better than one?

P Michel, O Levy, G Neubig - Advances in neural …, 2019 - proceedings.neurips.cc
Multi-headed attention is a driving force behind recent state-of-the-art NLP models. By
applying multiple attention mechanisms in parallel, it can express sophisticated functions …

Mask-predict: Parallel decoding of conditional masked language models

M Ghazvininejad, O Levy, Y Liu… - arxiv preprint arxiv …, 2019 - arxiv.org
Most machine translation systems generate text autoregressively from left to right. We,
instead, use a masked language modeling objective to train a model to predict any subset of …

Non-autoregressive machine translation with latent alignments

C Saharia, W Chan, S Saxena, M Norouzi - arxiv preprint arxiv …, 2020 - arxiv.org
This paper presents two strong methods, CTC and Imputer, for non-autoregressive machine
translation that model latent alignments with dynamic programming. We revisit CTC for …

Beyond BLEU: training neural machine translation with semantic similarity

J Wieting, T Berg-Kirkpatrick, K Gimpel… - arxiv preprint arxiv …, 2019 - arxiv.org
While most neural machine translation (NMT) systems are still trained using maximum
likelihood estimation, recent work has demonstrated that optimizing systems to directly …

Unsupervised multimodal machine translation for low-resource distant language pairs

T Tayir, L Li - ACM Transactions on Asian and Low-Resource …, 2024 - dl.acm.org
Unsupervised machine translation (UMT) has recently attracted more attention from
researchers, enabling models to translate when languages lack parallel corpora. However …

Very deep transformers for neural machine translation

X Liu, K Duh, L Liu, J Gao - arxiv preprint arxiv:2008.07772, 2020 - arxiv.org
We explore the application of very deep Transformer models for Neural Machine Translation
(NMT). Using a simple yet effective initialization technique that stabilizes training, we show …

Aligned cross entropy for non-autoregressive machine translation

M Ghazvininejad, V Karpukhin… - International …, 2020 - proceedings.mlr.press
Non-autoregressive machine translation models significantly speed up decoding by
allowing for parallel prediction of the entire target sequence. However, modeling word order …

Fixed encoder self-attention patterns in transformer-based machine translation

A Raganato, Y Scherrer, J Tiedemann - arxiv preprint arxiv:2002.10260, 2020 - arxiv.org
Transformer-based models have brought a radical change to neural machine translation. A
key feature of the Transformer architecture is the so-called multi-head attention mechanism …

Dinoiser: Diffused conditional sequence learning by manipulating noises

J Ye, Z Zheng, Y Bao, L Qian, M Wang - arxiv preprint arxiv:2302.10025, 2023 - arxiv.org
While diffusion models have achieved great success in generating continuous signals such
as images and audio, it remains elusive for diffusion models in learning discrete sequence …