Neural machine translation: A review
F Stahlberg - Journal of Artificial Intelligence Research, 2020 - jair.org
The field of machine translation (MT), the automatic translation of written text from one
natural language into another, has experienced a major paradigm shift in recent years …
natural language into another, has experienced a major paradigm shift in recent years …
Transformers in time-series analysis: A tutorial
Transformer architectures have widespread applications, particularly in Natural Language
Processing and Computer Vision. Recently, Transformers have been employed in various …
Processing and Computer Vision. Recently, Transformers have been employed in various …
Roformer: Enhanced transformer with rotary position embedding
Position encoding has recently been shown to be effective in transformer architecture. It
enables valuable supervision for dependency modeling between elements at different …
enables valuable supervision for dependency modeling between elements at different …
On the variance of the adaptive learning rate and beyond
The learning rate warmup heuristic achieves remarkable success in stabilizing training,
accelerating convergence and improving generalization for adaptive stochastic optimization …
accelerating convergence and improving generalization for adaptive stochastic optimization …
On layer normalization in the transformer architecture
The Transformer is widely used in natural language processing tasks. To train a Transformer
however, one usually needs a carefully designed learning rate warm-up stage, which is …
however, one usually needs a carefully designed learning rate warm-up stage, which is …
[HTML][HTML] Transforming machine translation: a deep learning system reaches news translation quality comparable to human professionals
The quality of human translation was long thought to be unattainable for computer
translation systems. In this study, we present a deep-learning system, CUBBITT, which …
translation systems. In this study, we present a deep-learning system, CUBBITT, which …
A comparative study on transformer vs rnn in speech applications
Sequence-to-sequence models have been widely used in end-to-end speech processing,
for example, automatic speech recognition (ASR), speech translation (ST), and text-to …
for example, automatic speech recognition (ASR), speech translation (ST), and text-to …
Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned
Multi-head self-attention is a key component of the Transformer, a state-of-the-art
architecture for neural machine translation. In this work we evaluate the contribution made …
architecture for neural machine translation. In this work we evaluate the contribution made …
Findings of the 2019 conference on machine translation (WMT19)
This paper presents the results of the premier shared task organized alongside the
Conference on Machine Translation (WMT) 2019. Participants were asked to build machine …
Conference on Machine Translation (WMT) 2019. Participants were asked to build machine …
Revisiting few-sample BERT fine-tuning
This paper is a study of fine-tuning of BERT contextual representations, with focus on
commonly observed instabilities in few-sample scenarios. We identify several factors that …
commonly observed instabilities in few-sample scenarios. We identify several factors that …