Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Transformers as statisticians: Provable in-context learning with in-context algorithm selection
Neural sequence models based on the transformer architecture have demonstrated
remarkable\emph {in-context learning}(ICL) abilities, where they can perform new tasks …
remarkable\emph {in-context learning}(ICL) abilities, where they can perform new tasks …
Training dynamics of multi-head softmax attention for in-context learning: Emergence, convergence, and optimality
We study the dynamics of gradient flow for training a multi-head softmax attention model for
in-context learning of multi-task linear regression. We establish the global convergence of …
in-context learning of multi-task linear regression. We establish the global convergence of …
Reason for future, act for now: A principled framework for autonomous llm agents with provable sample efficiency
Large language models (LLMs) demonstrate impressive reasoning abilities, but translating
reasoning into actions in the real world remains challenging. In particular, it remains unclear …
reasoning into actions in the real world remains challenging. In particular, it remains unclear …
Approximation and estimation ability of transformers for sequence-to-sequence functions with infinite dimensional input
Despite the great success of Transformer networks in various applications such as natural
language processing and computer vision, their theoretical aspects are not well understood …
language processing and computer vision, their theoretical aspects are not well understood …
A mechanism for sample-efficient in-context learning for sparse retrieval tasks
We study the phenomenon of in-context learning (ICL) exhibited by large language models,
where they can adapt to a new learning task, given a handful of labeled examples, without …
where they can adapt to a new learning task, given a handful of labeled examples, without …
Unveiling induction heads: Provable training dynamics and feature learning in transformers
In-context learning (ICL) is a cornerstone of large language model (LLM) functionality, yet its
theoretical foundations remain elusive due to the complexity of transformer architectures. In …
theoretical foundations remain elusive due to the complexity of transformer architectures. In …
Understanding scaling laws with statistical and approximation theory for transformer neural networks on intrinsically low-dimensional data
When training deep neural networks, a model's generalization error is often observed to
follow a power scaling law dependent both on the model size and the data size. Perhaps the …
follow a power scaling law dependent both on the model size and the data size. Perhaps the …
Sequence length independent norm-based generalization bounds for transformers
This paper provides norm-based generalization bounds for the Transformer architecture that
do not depend on the input sequence length. We employ a covering number based …
do not depend on the input sequence length. We employ a covering number based …
Reason for future, act for now: A principled architecture for autonomous llm agents
Large language models (LLMs) demonstrate impressive reasoning abilities, but translating
reasoning into actions in the real world remains challenging. In particular, it is unclear how …
reasoning into actions in the real world remains challenging. In particular, it is unclear how …
Provable Convergence of Single-Timescale Neural Actor-Critic in Continuous Spaces
X Chen, F Zhang, G Wang, L Zhao - openreview.net
Actor-critic (AC) algorithms have been the powerhouse behind many successful yet
challenging applications. However, the theoretical understanding of finite-time convergence …
challenging applications. However, the theoretical understanding of finite-time convergence …