Google Acadèmic

Y Bai, F Chen, H Wang, C **ong… - Advances in neural …, 2023 - proceedings.neurips.cc

Neural sequence models based on the transformer architecture have demonstrated
remarkable\emph {in-context learning}(ICL) abilities, where they can perform new tasks …

Desa Cita Citat per 200 Articles relacionats Totes les 7 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Training dynamics of multi-head softmax attention for in-context learning: Emergence, convergence, and optimality

S Chen, H Sheen, T Wang, Z Yang - arxiv preprint arxiv:2402.19442, 2024 - arxiv.org

We study the dynamics of gradient flow for training a multi-head softmax attention model for
in-context learning of multi-task linear regression. We establish the global convergence of …

Desa Cita Citat per 39 Articles relacionats Totes les 4 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Reason for future, act for now: A principled framework for autonomous llm agents with provable sample efficiency

Z Liu, H Hu, S Zhang, H Guo, S Ke, B Liu… - arxiv preprint arxiv …, 2023 - arxiv.org

Large language models (LLMs) demonstrate impressive reasoning abilities, but translating
reasoning into actions in the real world remains challenging. In particular, it remains unclear …

Desa Cita Citat per 38 Articles relacionats Totes les 2 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Approximation and estimation ability of transformers for sequence-to-sequence functions with infinite dimensional input

S Takakura, T Suzuki - International Conference on Machine …, 2023 - proceedings.mlr.press

Despite the great success of Transformer networks in various applications such as natural
language processing and computer vision, their theoretical aspects are not well understood …

Desa Cita Citat per 18 Articles relacionats Totes les 9 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

A mechanism for sample-efficient in-context learning for sparse retrieval tasks

J Abernethy, A Agarwal, TV Marinov… - International …, 2024 - proceedings.mlr.press

We study the phenomenon of in-context learning (ICL) exhibited by large language models,
where they can adapt to a new learning task, given a handful of labeled examples, without …

Desa Cita Citat per 23 Articles relacionats Totes les 4 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Unveiling induction heads: Provable training dynamics and feature learning in transformers

S Chen, H Sheen, T Wang, Z Yang - arxiv preprint arxiv:2409.10559, 2024 - arxiv.org

In-context learning (ICL) is a cornerstone of large language model (LLM) functionality, yet its
theoretical foundations remain elusive due to the complexity of transformer architectures. In …

Desa Cita Citat per 7 Articles relacionats Totes les 4 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Understanding scaling laws with statistical and approximation theory for transformer neural networks on intrinsically low-dimensional data

A Havrilla, W Liao - arxiv preprint arxiv:2411.06646, 2024 - arxiv.org

When training deep neural networks, a model's generalization error is often observed to
follow a power scaling law dependent both on the model size and the data size. Perhaps the …

Desa Cita Citat per 5 Articles relacionats Totes les 3 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Sequence length independent norm-based generalization bounds for transformers

J Trauger, A Tewari - International Conference on Artificial …, 2024 - proceedings.mlr.press

This paper provides norm-based generalization bounds for the Transformer architecture that
do not depend on the input sequence length. We employ a covering number based …

Desa Cita Citat per 9 Articles relacionats Totes les 4 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Reason for future, act for now: A principled architecture for autonomous llm agents

Z Liu, H Hu, S Zhang, H Guo, S Ke, B Liu… - Forty-first International …, 2024 - openreview.net

Large language models (LLMs) demonstrate impressive reasoning abilities, but translating
reasoning into actions in the real world remains challenging. In particular, it is unclear how …

Desa Cita Citat per 5 Articles relacionats Totes les 7 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Provable Convergence of Single-Timescale Neural Actor-Critic in Continuous Spaces

X Chen, F Zhang, G Wang, L Zhao - openreview.net

Actor-critic (AC) algorithms have been the powerhouse behind many successful yet
challenging applications. However, the theoretical understanding of finite-time convergence …

Desa Cita Articles relacionats Versió HTML

Crea una alerta

Cita

Cerca avançada

S'ha desat a La meva biblioteca

An analysis of attention via the lens of exchangeability and latent variable models

Transformers as statisticians: Provable in-context learning with in-context algorithm selection

Training dynamics of multi-head softmax attention for in-context learning: Emergence, convergence, and optimality

Reason for future, act for now: A principled framework for autonomous llm agents with provable sample efficiency

Approximation and estimation ability of transformers for sequence-to-sequence functions with infinite dimensional input

A mechanism for sample-efficient in-context learning for sparse retrieval tasks

Unveiling induction heads: Provable training dynamics and feature learning in transformers

Understanding scaling laws with statistical and approximation theory for transformer neural networks on intrinsically low-dimensional data

Sequence length independent norm-based generalization bounds for transformers

Reason for future, act for now: A principled architecture for autonomous llm agents

Provable Convergence of Single-Timescale Neural Actor-Critic in Continuous Spaces