- Academic Search

D Soydaner - Neural Computing and Applications, 2022 - Springer

A long time ago in the machine learning literature, the idea of incorporating a mechanism
inspired by the human visual system into neural networks was introduced. This idea is …

Lưu Trích dẫn Trích dẫn 183 bài viết Bài viết có liên quan Tất cả 9 phiên bản

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Transformers in time-series analysis: A tutorial

S Ahmed, IE Nielsen, A Tripathi, S Siddiqui… - Circuits, Systems, and …, 2023 - Springer

Transformer architectures have widespread applications, particularly in Natural Language
Processing and Computer Vision. Recently, Transformers have been employed in various …

Lưu Trích dẫn Trích dẫn 182 bài viết Bài viết có liên quan Tất cả 8 phiên bản

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Efficient large language models: A survey

Z Wan, X Wang, C Liu, S Alam, Y Zheng, J Liu… - ar** the big data paradigm with compact transformers

A Hassani, S Walton, N Shah, A Abuduweili… - arxiv preprint arxiv …, 2021 - arxiv.org

With the rise of Transformers as the standard for language processing, and their
advancements in computer vision, there has been a corresponding growth in parameter size …

Lưu Trích dẫn Trích dẫn 575 bài viết Bài viết có liên quan Tất cả 4 phiên bản Xem dạng HTML

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Deepnet: Scaling transformers to 1,000 layers

H Wang, S Ma, L Dong, S Huang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

In this paper, we propose a simple yet effective method to stabilize extremely deep
Transformers. Specifically, we introduce a new normalization function (DeepNorm) to modify …

Lưu Trích dẫn Trích dẫn 179 bài viết Bài viết có liên quan Tất cả 7 phiên bản

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Stabilizing transformer training by preventing attention entropy collapse

S Zhai, T Likhomanenko, E Littwin… - International …, 2023 - proceedings.mlr.press

Training stability is of great importance to Transformers. In this work, we investigate the
training dynamics of Transformers by examining the evolution of the attention layers. In …

Lưu Trích dẫn Trích dẫn 62 bài viết Bài viết có liên quan Tất cả 7 phiên bản Xem dạng HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Cerebras-gpt: Open compute-optimal language models trained on the cerebras wafer-scale cluster

N Dey, G Gosal, H Khachane, W Marshall… - arxiv preprint arxiv …, 2023 - arxiv.org

We study recent research advances that improve large language models through efficient
pre-training and scaling, and open datasets and tools. We combine these advances to …

Lưu Trích dẫn Trích dẫn 88 bài viết Bài viết có liên quan Tất cả 2 phiên bản Xem dạng HTML

[Free GPT-4]
[DeepSeek]

[PDF] frontiersin.org

BENDR: Using transformers and a contrastive self-supervised learning task to learn from massive amounts of EEG data

D Kostas, S Aroca-Ouellette, F Rudzicz - Frontiers in Human …, 2021 - frontiersin.org

Deep neural networks (DNNs) used for brain–computer interface (BCI) classification are
commonly expected to learn general features when trained across a variety of contexts, such …

Lưu Trích dẫn Trích dẫn 236 bài viết Bài viết có liên quan Tất cả 10 phiên bản Bản lưu

Tạo thông báo

Trích dẫn

Tìm kiếm nâng cao

Đã lưu vào Thư viện của tôi

Improving transformer optimization through better initialization

Attention mechanism in neural networks: where it comes and where it goes

Transformers in time-series analysis: A tutorial

Efficient large language models: A survey

Deepnet: Scaling transformers to 1,000 layers

Stabilizing transformer training by preventing attention entropy collapse

Cerebras-gpt: Open compute-optimal language models trained on the cerebras wafer-scale cluster

BENDR: Using transformers and a contrastive self-supervised learning task to learn from massive amounts of EEG data