Neural machine translation: A review

F Stahlberg - Journal of Artificial Intelligence Research, 2020 - jair.org
The field of machine translation (MT), the automatic translation of written text from one
natural language into another, has experienced a major paradigm shift in recent years …

Transformers in time-series analysis: A tutorial

S Ahmed, IE Nielsen, A Tripathi, S Siddiqui… - Circuits, Systems, and …, 2023 - Springer
Transformer architectures have widespread applications, particularly in Natural Language
Processing and Computer Vision. Recently, Transformers have been employed in various …

Roformer: Enhanced transformer with rotary position embedding

J Su, M Ahmed, Y Lu, S Pan, W Bo, Y Liu - Neurocomputing, 2024 - Elsevier
Position encoding has recently been shown to be effective in transformer architecture. It
enables valuable supervision for dependency modeling between elements at different …

On the variance of the adaptive learning rate and beyond

L Liu, H Jiang, P He, W Chen, X Liu, J Gao… - arxiv preprint arxiv …, 2019 - arxiv.org
The learning rate warmup heuristic achieves remarkable success in stabilizing training,
accelerating convergence and improving generalization for adaptive stochastic optimization …

On layer normalization in the transformer architecture

R **ong, Y Yang, D He, K Zheng… - International …, 2020 - proceedings.mlr.press
The Transformer is widely used in natural language processing tasks. To train a Transformer
however, one usually needs a carefully designed learning rate warm-up stage, which is …

[HTML][HTML] Transforming machine translation: a deep learning system reaches news translation quality comparable to human professionals

M Popel, M Tomkova, J Tomek, Ł Kaiser… - Nature …, 2020 - nature.com
The quality of human translation was long thought to be unattainable for computer
translation systems. In this study, we present a deep-learning system, CUBBITT, which …

A comparative study on transformer vs rnn in speech applications

S Karita, N Chen, T Hayashi, T Hori… - 2019 IEEE automatic …, 2019 - ieeexplore.ieee.org
Sequence-to-sequence models have been widely used in end-to-end speech processing,
for example, automatic speech recognition (ASR), speech translation (ST), and text-to …

Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned

E Voita, D Talbot, F Moiseev, R Sennrich… - arxiv preprint arxiv …, 2019 - arxiv.org
Multi-head self-attention is a key component of the Transformer, a state-of-the-art
architecture for neural machine translation. In this work we evaluate the contribution made …

Findings of the 2019 conference on machine translation (WMT19)

L Barrault, O Bojar, MR Costa-Jussa, C Federmann… - 2019 - zora.uzh.ch
This paper presents the results of the premier shared task organized alongside the
Conference on Machine Translation (WMT) 2019. Participants were asked to build machine …

Revisiting few-sample BERT fine-tuning

T Zhang, F Wu, A Katiyar, KQ Weinberger… - arxiv preprint arxiv …, 2020 - arxiv.org
This paper is a study of fine-tuning of BERT contextual representations, with focus on
commonly observed instabilities in few-sample scenarios. We identify several factors that …