Attention mechanism in neural networks: where it comes and where it goes

D Soydaner - Neural Computing and Applications, 2022 - Springer
A long time ago in the machine learning literature, the idea of incorporating a mechanism
inspired by the human visual system into neural networks was introduced. This idea is …

Transformers in time-series analysis: A tutorial

S Ahmed, IE Nielsen, A Tripathi, S Siddiqui… - Circuits, Systems, and …, 2023 - Springer
Transformer architectures have widespread applications, particularly in Natural Language
Processing and Computer Vision. Recently, Transformers have been employed in various …

Efficient large language models: A survey

Z Wan, X Wang, C Liu, S Alam, Y Zheng, J Liu… - ar** the big data paradigm with compact transformers
A Hassani, S Walton, N Shah, A Abuduweili… - arxiv preprint arxiv …, 2021 - arxiv.org
With the rise of Transformers as the standard for language processing, and their
advancements in computer vision, there has been a corresponding growth in parameter size …

Deepnet: Scaling transformers to 1,000 layers

H Wang, S Ma, L Dong, S Huang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
In this paper, we propose a simple yet effective method to stabilize extremely deep
Transformers. Specifically, we introduce a new normalization function (DeepNorm) to modify …

Stabilizing transformer training by preventing attention entropy collapse

S Zhai, T Likhomanenko, E Littwin… - International …, 2023 - proceedings.mlr.press
Training stability is of great importance to Transformers. In this work, we investigate the
training dynamics of Transformers by examining the evolution of the attention layers. In …

Cerebras-gpt: Open compute-optimal language models trained on the cerebras wafer-scale cluster

N Dey, G Gosal, H Khachane, W Marshall… - arxiv preprint arxiv …, 2023 - arxiv.org
We study recent research advances that improve large language models through efficient
pre-training and scaling, and open datasets and tools. We combine these advances to …

BENDR: Using transformers and a contrastive self-supervised learning task to learn from massive amounts of EEG data

D Kostas, S Aroca-Ouellette, F Rudzicz - Frontiers in Human …, 2021 - frontiersin.org
Deep neural networks (DNNs) used for brain–computer interface (BCI) classification are
commonly expected to learn general features when trained across a variety of contexts, such …