- Academic Search

F Stahlberg - Journal of Artificial Intelligence Research, 2020 - jair.org

The field of machine translation (MT), the automatic translation of written text from one
natural language into another, has experienced a major paradigm shift in recent years …

Spara Citera Citerat av 460 Relaterade artiklar Alla 7 versionerna Se som HTML-version

[Free GPT-4]

[PDF] arxiv.org

Transformers in time-series analysis: A tutorial

S Ahmed, IE Nielsen, A Tripathi, S Siddiqui… - Circuits, Systems, and …, 2023 - Springer

Transformer architectures have widespread applications, particularly in Natural Language
Processing and Computer Vision. Recently, Transformers have been employed in various …

Spara Citera Citerat av 173 Relaterade artiklar Alla 6 versionerna

[Free GPT-4]

[PDF] arxiv.org

Roformer: Enhanced transformer with rotary position embedding

J Su, M Ahmed, Y Lu, S Pan, W Bo, Y Liu - Neurocomputing, 2024 - Elsevier

Position encoding has recently been shown to be effective in transformer architecture. It
enables valuable supervision for dependency modeling between elements at different …

Spara Citera Citerat av 1833 Relaterade artiklar Alla 5 versionerna

[Free GPT-4]

[PDF] arxiv.org

On the variance of the adaptive learning rate and beyond

L Liu, H Jiang, P He, W Chen, X Liu, J Gao… - arxiv preprint arxiv …, 2019 - arxiv.org

The learning rate warmup heuristic achieves remarkable success in stabilizing training,
accelerating convergence and improving generalization for adaptive stochastic optimization …

Spara Citera Citerat av 2419 Relaterade artiklar Alla 5 versionerna Se som HTML-version

[Free GPT-4]

[PDF] mlr.press

On layer normalization in the transformer architecture

R **ong, Y Yang, D He, K Zheng… - International …, 2020 - proceedings.mlr.press

The Transformer is widely used in natural language processing tasks. To train a Transformer
however, one usually needs a carefully designed learning rate warm-up stage, which is …

Spara Citera Citerat av 1113 Relaterade artiklar Alla 7 versionerna Se som HTML-version

[Free GPT-4]

[HTML] nature.com

[HTML][HTML] Transforming machine translation: a deep learning system reaches news translation quality comparable to human professionals

M Popel, M Tomkova, J Tomek, Ł Kaiser… - Nature …, 2020 - nature.com

The quality of human translation was long thought to be unattainable for computer
translation systems. In this study, we present a deep-learning system, CUBBITT, which …

Spara Citera Citerat av 365 Relaterade artiklar Alla 15 versionerna

[Free GPT-4]

[PDF] arxiv.org

A comparative study on transformer vs rnn in speech applications

S Karita, N Chen, T Hayashi, T Hori… - 2019 IEEE automatic …, 2019 - ieeexplore.ieee.org

Sequence-to-sequence models have been widely used in end-to-end speech processing,
for example, automatic speech recognition (ASR), speech translation (ST), and text-to …

Spara Citera Citerat av 896 Relaterade artiklar Alla 10 versionerna

[Free GPT-4]

[PDF] aclanthology.org

Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned

E Voita, D Talbot, F Moiseev, R Sennrich… - arxiv preprint arxiv …, 2019 - arxiv.org

Multi-head self-attention is a key component of the Transformer, a state-of-the-art
architecture for neural machine translation. In this work we evaluate the contribution made …

Spara Citera Citerat av 1353 Relaterade artiklar Alla 10 versionerna Se som HTML-version

[Free GPT-4]

[PDF] uzh.ch

Findings of the 2019 conference on machine translation (WMT19)

L Barrault, O Bojar, MR Costa-Jussa, C Federmann… - 2019 - zora.uzh.ch

This paper presents the results of the premier shared task organized alongside the
Conference on Machine Translation (WMT) 2019. Participants were asked to build machine …

Spara Citera Citerat av 776 Relaterade artiklar Alla 13 versionerna Se som HTML-version

[Free GPT-4]

[PDF] arxiv.org

Revisiting few-sample BERT fine-tuning

T Zhang, F Wu, A Katiyar, KQ Weinberger… - arxiv preprint arxiv …, 2020 - arxiv.org

This paper is a study of fine-tuning of BERT contextual representations, with focus on
commonly observed instabilities in few-sample scenarios. We identify several factors that …

Spara Citera Citerat av 468 Relaterade artiklar Alla 3 versionerna Se som HTML-version

Citera

Avancerad sökning

Har sparats i Mitt bibliotek

Neural machine translation: A review

Transformers in time-series analysis: A tutorial

Roformer: Enhanced transformer with rotary position embedding

On the variance of the adaptive learning rate and beyond

On layer normalization in the transformer architecture

[HTML][HTML] Transforming machine translation: a deep learning system reaches news translation quality comparable to human professionals

A comparative study on transformer vs rnn in speech applications

Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned

Findings of the 2019 conference on machine translation (WMT19)

Revisiting few-sample BERT fine-tuning