Towards a mathematical understanding of neural network-based machine learning: what we know and what we don't

C Ma, S Wojtowytsch, L Wu - arxiv preprint arxiv:2009.10713, 2020 - arxiv.org
The purpose of this article is to review the achievements made in the last few years towards
the understanding of the reasons behind the success and subtleties of neural network …

The Barron space and the flow-induced function spaces for neural network models

C Ma, L Wu - Constructive Approximation, 2022 - Springer
One of the key issues in the analysis of machine learning models is to identify the
appropriate function space and norm for the model. This is the set of functions endowed with …

Characterization of the variation spaces corresponding to shallow neural networks

JW Siegel, J Xu - Constructive Approximation, 2023 - Springer
We study the variation space corresponding to a dictionary of functions in L 2 (Ω) for a
bounded domain Ω⊂ R d. Specifically, we compare the variation space, which is defined in …

Penalising the biases in norm regularisation enforces sparsity

E Boursier, N Flammarion - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Controlling the parameters' norm often yields good generalisation when training neural
networks. Beyond simple intuitions, the relation between regularising parameters' norm and …

Optimized injection of noise in activation functions to improve generalization of neural networks

F Duan, F Chapeau-Blondeau, D Abbott - Chaos, Solitons & Fractals, 2024 - Elsevier
This paper proposes a flexible probabilistic activation function that enhances the training
and operation of artificial neural networks by intentionally injecting noise to gain additional …

Transformers learn nonlinear features in context: Nonconvex mean-field dynamics on the attention landscape

J Kim, T Suzuki - arxiv preprint arxiv:2402.01258, 2024 - arxiv.org
Large language models based on the Transformer architecture have demonstrated
impressive capabilities to learn in context. However, existing theoretical studies on how this …

Minimum norm interpolation by perceptra: Explicit regularization and implicit bias

J Park, I Pelakh, S Wojtowytsch - Advances in Neural …, 2023 - proceedings.neurips.cc
We investigate how shallow ReLU networks interpolate between known regions. Our
analysis shows that empirical risk minimizers converge to a minimum norm interpolant as …

Generalization error bounds for deep neural networks trained by sgd

M Wang, C Ma - arxiv preprint arxiv:2206.03299, 2022 - arxiv.org
Generalization error bounds for deep neural networks trained by stochastic gradient descent
(SGD) are derived by combining a dynamical control of an appropriate parameter norm and …

[HTML][HTML] Embeddings between Barron spaces with higher-order activation functions

TJ Heeringa, L Spek, FL Schwenninger… - Applied and …, 2024 - Elsevier
The approximation properties of infinitely wide shallow neural networks heavily depend on
the choice of the activation function. To understand this influence, we study embeddings …

Some observations on high-dimensional partial differential equations with barron data

E Weinan, S Wojtowytsch - Mathematical and Scientific …, 2022 - proceedings.mlr.press
We use explicit representation formulas to show that solutions to certain partial differential
equa-tions lie in Barron spaces or multilayer spaces if the PDE data lie in such function …