- Academic Search

B Geshkovski, C Letrouit, Y Polyanskiy… - arxiv preprint arxiv …, 2023 - arxiv.org

Transformers play a central role in the inner workings of large language models. We
develop a mathematical framework for analyzing Transformers based on their interpretation …

บันทึก อ้างอิง อ้างโดย47 บทความที่เกี่ยวข้อง ทั้งหมด 7 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

From local structures to size generalization in graph neural networks

G Yehudai, E Fetaya, E Meirom… - International …, 2021 - proceedings.mlr.press

Graph neural networks (GNNs) can process graphs of different sizes, but their ability to
generalize across sizes, specifically from small to large graphs, is still not well understood. In …

บันทึก อ้างอิง อ้างโดย147 บทความที่เกี่ยวข้อง ทั้งหมด 4 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Sinkformers: Transformers with doubly stochastic attention

ME Sander, P Ablin, M Blondel… - … Conference on Artificial …, 2022 - proceedings.mlr.press

Attention based models such as Transformers involve pairwise interactions between data
points, modeled with a learnable attention matrix. Importantly, this attention matrix is …

บันทึก อ้างอิง อ้างโดย84 บทความที่เกี่ยวข้อง ทั้งหมด 5 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

The exact sample complexity gain from invariances for kernel regression

B Tahmasebi, S Jegelka - Advances in Neural Information …, 2023 - proceedings.neurips.cc

In practice, encoding invariances into models improves sample complexity. In this work, we
study this phenomenon from a theoretical perspective. In particular, we provide minimax …

บันทึก อ้างอิง อ้างโดย19 บทความที่เกี่ยวข้อง ทั้งหมด 8 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] jmlr.org

Learning with norm constrained, over-parameterized, two-layer neural networks

F Liu, L Dadi, V Cevher - Journal of Machine Learning Research, 2024 - jmlr.org

Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space
to model functions by neural networks as the curse of dimensionality (CoD) cannot be …

บันทึก อ้างอิง อ้างโดย5 บทความที่เกี่ยวข้อง ทั้งหมด 5 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

How smooth is attention?

V Castin, P Ablin, G Peyré - arxiv preprint arxiv:2312.14820, 2023 - arxiv.org

Self-attention and masked self-attention are at the heart of Transformers' outstanding
success. Still, our mathematical understanding of attention, in particular of its Lipschitz …

บันทึก อ้างอิง อ้างโดย10 บทความที่เกี่ยวข้อง ทั้งหมด 7 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Universal approximation of symmetric and anti-symmetric functions

J Han, Y Li, L Lin, J Lu, J Zhang, L Zhang - arxiv preprint arxiv …, 2019 - arxiv.org

We consider universal approximations of symmetric and anti-symmetric functions, which are
important for applications in quantum physics, as well as other scientific and engineering …

บันทึก อ้างอิง อ้างโดย38 บทความที่เกี่ยวข้อง ทั้งหมด 13 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] jmlr.org

Deep neural network approximation of invariant functions through dynamical systems

Q Li, T Lin, Z Shen - Journal of Machine Learning Research, 2024 - jmlr.org

We study the approximation of functions which are invariant with respect to certain
permutations of the input indices using flow maps of dynamical systems. Such invariant …

บันทึก อ้างอิง อ้างโดย7 บทความที่เกี่ยวข้อง ทั้งหมด 4 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Learning theory of distribution regression with neural networks

Z Shi, Z Yu, DX Zhou - arxiv preprint arxiv:2307.03487, 2023 - arxiv.org

In this paper, we aim at establishing an approximation theory and a learning theory of
distribution regression via a fully connected neural network (FNN). In contrast to the classical …

บันทึก อ้างอิง อ้างโดย2 บทความที่เกี่ยวข้อง ทั้งหมด 2 ฉบับ ดูในรูปแบบ HTML

Deep learning theory of distribution regression with CNNs

Z Yu, DX Zhou - Advances in Computational Mathematics, 2023 - Springer

We establish a deep learning theory for distribution regression with deep convolutional
neural networks (DCNNs). Deep learning based on structured deep neural networks has …

บันทึก อ้างอิง อ้างโดย2 บทความที่เกี่ยวข้อง ทั้งหมด 3 ฉบับ

สร้างการแจ้งเตือน

อ้างอิง

การค้นหาขั้นสูง

บันทึกไปยังคลังของฉันแล้ว

A functional perspective on learning symmetric functions with neural networks

A mathematical perspective on transformers

From local structures to size generalization in graph neural networks

Sinkformers: Transformers with doubly stochastic attention

The exact sample complexity gain from invariances for kernel regression

Learning with norm constrained, over-parameterized, two-layer neural networks

How smooth is attention?

Universal approximation of symmetric and anti-symmetric functions

Deep neural network approximation of invariant functions through dynamical systems

Learning theory of distribution regression with neural networks

Deep learning theory of distribution regression with CNNs