Attention mechanism in neural networks: where it comes and where it goes

D Soydaner - Neural Computing and Applications, 2022 - Springer
A long time ago in the machine learning literature, the idea of incorporating a mechanism
inspired by the human visual system into neural networks was introduced. This idea is …

A survey on document-level neural machine translation: Methods and evaluation

S Maruf, F Saleh, G Haffari - ACM Computing Surveys (CSUR), 2021 - dl.acm.org
Machine translation (MT) is an important task in natural language processing (NLP), as it
automates the translation process and reduces the reliance on human translators. With the …

ETC: Encoding long and structured inputs in transformers

J Ainslie, S Ontanon, C Alberti, V Cvicek… - arxiv preprint arxiv …, 2020 - arxiv.org
Transformer models have advanced the state of the art in many Natural Language
Processing (NLP) tasks. In this paper, we present a new Transformer architecture, Extended …

A survey on green deep learning

J Xu, W Zhou, Z Fu, H Zhou, L Li - arxiv preprint arxiv:2111.05193, 2021 - arxiv.org
In recent years, larger and deeper models are springing up and continuously pushing state-
of-the-art (SOTA) results across various fields like natural language processing (NLP) and …

Adaptively sparse transformers

GM Correia, V Niculae, AFT Martins - arxiv preprint arxiv:1909.00015, 2019 - arxiv.org
Attention mechanisms have become ubiquitous in NLP. Recent architectures, notably the
Transformer, learn powerful context-aware word representations through layered, multi …

Scientific credibility of machine translation research: A meta-evaluation of 769 papers

B Marie, A Fujita, R Rubino - arxiv preprint arxiv:2106.15195, 2021 - arxiv.org
This paper presents the first large-scale meta-evaluation of machine translation (MT). We
annotated MT evaluations conducted in 769 research papers published from 2010 to 2020 …

A study on relu and softmax in transformer

K Shen, J Guo, X Tan, S Tang, R Wang… - arxiv preprint arxiv …, 2023 - arxiv.org
The Transformer architecture consists of self-attention and feed-forward networks (FFNs)
which can be viewed as key-value memories according to previous works. However, FFN …

G-transformer for document-level machine translation

G Bao, Y Zhang, Z Teng, B Chen, W Luo - arxiv preprint arxiv:2105.14761, 2021 - arxiv.org
Document-level MT models are still far from satisfactory. Existing work extend translation unit
from single sentence to multiple sentences. However, study shows that when we further …

A simple and effective unified encoder for document-level machine translation

S Ma, D Zhang, M Zhou - Proceedings of the 58th annual meeting …, 2020 - aclanthology.org
Most of the existing models for document-level machine translation adopt dual-encoder
structures. The representation of the source sentences and the document-level contexts are …

Towards making the most of context in neural machine translation

Z Zheng, X Yue, S Huang, J Chen, A Birch - arxiv preprint arxiv …, 2020 - arxiv.org
Document-level machine translation manages to outperform sentence level models by a
small margin, but have failed to be widely adopted. We argue that previous research did not …