Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Dissecting the interplay of attention paths in a statistical mechanics theory of transformers
Despite the remarkable empirical performance of Transformers, their theoretical
understanding remains elusive. Here, we consider a deep multi-head self-attention network …
understanding remains elusive. Here, we consider a deep multi-head self-attention network …
Transformers are minimax optimal nonparametric in-context learners
In-context learning (ICL) of large language models has proven to be a surprisingly effective
method of learning a new task from only a few demonstrative examples. In this paper, we …
method of learning a new task from only a few demonstrative examples. In this paper, we …
In-context learning with representations: Contextual generalization of trained transformers
In-context learning (ICL) refers to a remarkable capability of pretrained large language
models, which can learn a new task given a few examples during inference. However …
models, which can learn a new task given a few examples during inference. However …
Unveil benign overfitting for transformer in vision: Training dynamics, convergence, and generalization
Transformers have demonstrated great power in the recent development of large
foundational models. In particular, the Vision Transformer (ViT) has brought revolutionary …
foundational models. In particular, the Vision Transformer (ViT) has brought revolutionary …
Bypassing the exponential dependency: Looped transformers efficiently learn in-context by multi-step gradient descent
In-context learning has been recognized as a key factor in the success of Large Language
Models (LLMs). It refers to the model's ability to learn patterns on the fly from provided in …
Models (LLMs). It refers to the model's ability to learn patterns on the fly from provided in …
Transformers learn nonlinear features in context: Nonconvex mean-field dynamics on the attention landscape
Large language models based on the Transformer architecture have demonstrated
impressive capabilities to learn in context. However, existing theoretical studies on how this …
impressive capabilities to learn in context. However, existing theoretical studies on how this …
Pretrained transformer efficiently learns low-dimensional target functions in-context
Transformers can efficiently learn in-context from example demonstrations. Most existing
theoretical analyses studied the in-context learning (ICL) ability of transformers for linear …
theoretical analyses studied the in-context learning (ICL) ability of transformers for linear …
How does promoting the minority fraction affect generalization? a theoretical study of one-hidden-layer neural network on group imbalance
Group imbalance has been a known problem in empirical risk minimization (ERM), where
the achieved high average accuracy is accompanied by low accuracy in a minority group …
the achieved high average accuracy is accompanied by low accuracy in a minority group …
On mesa-optimization in autoregressively trained transformers: Emergence and capability
Autoregressively trained transformers have brought a profound revolution to the world,
especially with their in-context learning (ICL) ability to address downstream tasks. Recently …
especially with their in-context learning (ICL) ability to address downstream tasks. Recently …