Google Tudós

J Wei, J Wei, Y Tay, D Tran, A Webson, Y Lu… - ar**s. We investigate two setups-ICL with flipped labels and ICL with …

Mentés Hivatkozás Idézetek száma: 304 Kapcsolódó cikkek Mind a(z) 7 változat HTML-változat

Scan and snap: Understanding training dynamics and token composition in 1-layer transformer

Y Tian, Y Wang, B Chen, SS Du - Advances in Neural …, 2023 - proceedings.neurips.cc

Transformer architecture has shown impressive performance in multiple research domains
and has become the backbone of many neural network models. However, there is limited …

Mentés Hivatkozás Idézetek száma: 82 Kapcsolódó cikkek Mind a(z) 10 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Max-margin token selection in attention mechanism

D Ataee Tarzanagh, Y Li, X Zhang… - Advances in Neural …, 2023 - proceedings.neurips.cc

Attention mechanism is a central component of the transformer architecture which led to the
phenomenal success of large language models. However, the theoretical principles …

Mentés Hivatkozás Idézetek száma: 51 Kapcsolódó cikkek Mind a(z) 6 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

What can a single attention layer learn? a study through the random features lens

H Fu, T Guo, Y Bai, S Mei - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Attention layers---which map a sequence of inputs to a sequence of outputs---are core
building blocks of the Transformer architecture which has achieved significant …

Mentés Hivatkozás Idézetek száma: 29 Kapcsolódó cikkek Mind a(z) 6 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

White-box transformers via sparse rate reduction

Y Yu, S Buchanan, D Pai, T Chu, Z Wu… - Advances in …, 2023 - proceedings.neurips.cc

In this paper, we contend that the objective of representation learning is to compress and
transform the distribution of the data, say sets of tokens, towards a mixture of low …

Mentés Hivatkozás Idézetek száma: 72 Kapcsolódó cikkek Mind a(z) 6 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

On the role of attention in prompt-tuning

S Oymak, AS Rawat, M Soltanolkotabi… - International …, 2023 - proceedings.mlr.press

Prompt-tuning is an emerging strategy to adapt large language models (LLM) to
downstream tasks by learning a (soft-) prompt parameter from data. Despite its success in …

Mentés Hivatkozás Idézetek száma: 59 Kapcsolódó cikkek Mind a(z) 9 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Transformers as support vector machines

DA Tarzanagh, Y Li, C Thrampoulidis… - arxiv preprint arxiv …, 2023 - arxiv.org

Since its inception in" Attention Is All You Need", transformer architecture has led to
revolutionary advancements in NLP. The attention layer within the transformer admits a …

Mentés Hivatkozás Idézetek száma: 81 Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

In-context convergence of transformers

Y Huang, Y Cheng, Y Liang - arxiv preprint arxiv:2310.05249, 2023 - arxiv.org

Transformers have recently revolutionized many domains in modern machine learning and
one salient discovery is their remarkable in-context learning capability, where models can …

Mentés Hivatkozás Idézetek száma: 69 Kapcsolódó cikkek Mind a(z) 6 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

On the Convergence and Sample Complexity Analysis of Deep Q-Networks with -Greedy Exploration

S Zhang, H Li, M Wang, M Liu… - Advances in …, 2024 - proceedings.neurips.cc

This paper provides a theoretical understanding of deep Q-Network (DQN) with the
$\varepsilon $-greedy exploration in deep reinforcement learning. Despite the tremendous …

Mentés Hivatkozás Idézetek száma: 24 Kapcsolódó cikkek Mind a(z) 8 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Joma: Demystifying multilayer transformers via joint dynamics of mlp and attention

Y Tian, Y Wang, Z Zhang, B Chen, S Du - arxiv preprint arxiv:2310.00535, 2023 - arxiv.org

We propose Joint MLP/Attention (JoMA) dynamics, a novel mathematical framework to
understand the training procedure of multilayer Transformer architectures. This is achieved …

Mentés Hivatkozás Idézetek száma: 49 Kapcsolódó cikkek Mind a(z) 6 változat HTML-változat

Értesítés létrehozása

Hivatkozás

Speciális keresés

Mentve a Saját könyvtárba

A theoretical understanding of shallow vision transformers: Learning, generalization, and...

Larger language models do in-context learning differently

Scan and snap: Understanding training dynamics and token composition in 1-layer transformer

Max-margin token selection in attention mechanism

What can a single attention layer learn? a study through the random features lens

White-box transformers via sparse rate reduction

On the role of attention in prompt-tuning

Transformers as support vector machines

In-context convergence of transformers

On the Convergence and Sample Complexity Analysis of Deep Q-Networks with -Greedy Exploration

Joma: Demystifying multilayer transformers via joint dynamics of mlp and attention