Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
A mathematical perspective on transformers
B Geshkovski, C Letrouit, Y Polyanskiy… - arxiv preprint arxiv …, 2023 - arxiv.org
Transformers play a central role in the inner workings of large language models. We
develop a mathematical framework for analyzing Transformers based on their interpretation …
develop a mathematical framework for analyzing Transformers based on their interpretation …
From local structures to size generalization in graph neural networks
Graph neural networks (GNNs) can process graphs of different sizes, but their ability to
generalize across sizes, specifically from small to large graphs, is still not well understood. In …
generalize across sizes, specifically from small to large graphs, is still not well understood. In …
Sinkformers: Transformers with doubly stochastic attention
Attention based models such as Transformers involve pairwise interactions between data
points, modeled with a learnable attention matrix. Importantly, this attention matrix is …
points, modeled with a learnable attention matrix. Importantly, this attention matrix is …
The exact sample complexity gain from invariances for kernel regression
In practice, encoding invariances into models improves sample complexity. In this work, we
study this phenomenon from a theoretical perspective. In particular, we provide minimax …
study this phenomenon from a theoretical perspective. In particular, we provide minimax …
Learning with norm constrained, over-parameterized, two-layer neural networks
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space
to model functions by neural networks as the curse of dimensionality (CoD) cannot be …
to model functions by neural networks as the curse of dimensionality (CoD) cannot be …
How smooth is attention?
Self-attention and masked self-attention are at the heart of Transformers' outstanding
success. Still, our mathematical understanding of attention, in particular of its Lipschitz …
success. Still, our mathematical understanding of attention, in particular of its Lipschitz …
Universal approximation of symmetric and anti-symmetric functions
We consider universal approximations of symmetric and anti-symmetric functions, which are
important for applications in quantum physics, as well as other scientific and engineering …
important for applications in quantum physics, as well as other scientific and engineering …
Deep neural network approximation of invariant functions through dynamical systems
We study the approximation of functions which are invariant with respect to certain
permutations of the input indices using flow maps of dynamical systems. Such invariant …
permutations of the input indices using flow maps of dynamical systems. Such invariant …
Learning theory of distribution regression with neural networks
In this paper, we aim at establishing an approximation theory and a learning theory of
distribution regression via a fully connected neural network (FNN). In contrast to the classical …
distribution regression via a fully connected neural network (FNN). In contrast to the classical …
Deep learning theory of distribution regression with CNNs
Z Yu, DX Zhou - Advances in Computational Mathematics, 2023 - Springer
We establish a deep learning theory for distribution regression with deep convolutional
neural networks (DCNNs). Deep learning based on structured deep neural networks has …
neural networks (DCNNs). Deep learning based on structured deep neural networks has …