Google 학술 검색

Bypassing the exponential dependency: Looped transformers efficiently learn in-context by multi-step gradient descent

B Chen, X Li, Y Liang, Z Shi, Z Song - arxiv preprint arxiv:2410.11268, 2024 - arxiv.org

In-context learning has been recognized as a key factor in the success of Large Language
Models (LLMs). It refers to the model's ability to learn patterns on the fly from provided in …

저장 인용 13회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

How transformers learn causal structure with gradient descent

E Nichani, A Damian, JD Lee - arxiv preprint arxiv:2402.14735, 2024 - arxiv.org

The incredible success of transformers on sequence modeling tasks can be largely
attributed to the self-attention mechanism, which allows information to be transferred …

저장 인용 54회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Galore: Memory-efficient llm training by gradient low-rank projection

J Zhao, Z Zhang, B Chen, Z Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

Training Large Language Models (LLMs) presents significant memory challenges,
predominantly due to the growing size of weights and optimizer states. Common memory …

저장 인용 46회 인용 관련 학술자료 전체 5개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Training dynamics of multi-head softmax attention for in-context learning: Emergence, convergence, and optimality

S Chen, H Sheen, T Wang, Z Yang - arxiv preprint arxiv:2402.19442, 2024 - arxiv.org

We study the dynamics of gradient flow for training a multi-head softmax attention model for
in-context learning of multi-task linear regression. We establish the global convergence of …

저장 인용 37회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

A primer on the inner workings of transformer-based language models

J Ferrando, G Sarti, A Bisazza… - arxiv preprint arxiv …, 2024 - arxiv.org

The rapid progress of research aimed at interpreting the inner workings of advanced
language models has highlighted a need for contextualizing the insights gained from years …

저장 인용 35회 인용 관련 학술자료 전체 4개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

An information-theoretic analysis of in-context learning

HJ Jeon, JD Lee, Q Lei, B Van Roy - arxiv preprint arxiv:2401.15530, 2024 - arxiv.org

Previous theoretical results pertaining to meta-learning on sequences build on contrived
assumptions and are somewhat convoluted. We introduce new information-theoretic tools …

저장 인용 22회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]

[PDF] researchgate.net

[PDF][PDF] Training nonlinear transformers for efficient in-context learning: A theoretical learning and generalization analysis

H Li, M Wang, S Lu, X Cui, PY Chen - arxiv preprint arxiv …, 2024 - researchgate.net

Transformer-based large language models have displayed impressive in-context learning
capabilities, where a pre-trained model can handle new tasks without fine-tuning by simply …

저장 인용 16회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Unveiling induction heads: Provable training dynamics and feature learning in transformers

S Chen, H Sheen, T Wang, Z Yang - arxiv preprint arxiv:2409.10559, 2024 - arxiv.org

In-context learning (ICL) is a cornerstone of large language model (LLM) functionality, yet its
theoretical foundations remain elusive due to the complexity of transformer architectures. In …

저장 인용 5회 인용 관련 학술자료 전체 4개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

In-context learning with transformers: Softmax attention adapts to function lipschitzness

L Collins, A Parulekar, A Mokhtari, S Sanghavi… - arxiv preprint arxiv …, 2024 - arxiv.org

A striking property of transformers is their ability to perform in-context learning (ICL), a
machine learning framework in which the learner is presented with a novel context during …

저장 인용 17회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers

Y Jiang, G Rajendran, P Ravikumar… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Language Models (LLMs) have the capacity to store and recall facts. Through
experimentation with open-source models, we observe that this ability to retrieve facts can …

저장 인용 5회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

알림 만들기

인용

고급 검색

라이브러리에 저장됨

Joma: Demystifying multilayer transformers via joint dynamics of mlp and attention

Bypassing the exponential dependency: Looped transformers efficiently learn in-context by multi-step gradient descent

How transformers learn causal structure with gradient descent

Galore: Memory-efficient llm training by gradient low-rank projection

Training dynamics of multi-head softmax attention for in-context learning: Emergence, convergence, and optimality

A primer on the inner workings of transformer-based language models

An information-theoretic analysis of in-context learning

[PDF][PDF] Training nonlinear transformers for efficient in-context learning: A theoretical learning and generalization analysis

Unveiling induction heads: Provable training dynamics and feature learning in transformers

In-context learning with transformers: Softmax attention adapts to function lipschitzness

Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers