Statistically meaningful approximation: a case study on approximating turing machines with transformers

C Wei, Y Chen, T Ma - Advances in Neural Information …, 2022 - proceedings.neurips.cc
A common lens to theoretically study neural net architectures is to analyze the functions they
can approximate. However, the constructions from approximation theory often have …

A function space view of bounded norm infinite width relu nets: The multivariate case

G Ongie, R Willett, D Soudry, N Srebro - arxiv preprint arxiv:1910.01635, 2019 - arxiv.org
A key element of understanding the efficacy of overparameterized neural networks is
characterizing how they represent functions as the number of weights in the network …

On the effective number of linear regions in shallow univariate relu networks: Convergence guarantees and implicit bias

I Safran, G Vardi, JD Lee - Advances in Neural Information …, 2022 - proceedings.neurips.cc
We study the dynamics and implicit bias of gradient flow (GF) on univariate ReLU neural
networks with a single hidden layer in a binary classification setting. We show that when the …

A novel framework for policy mirror descent with general parameterization and linear convergence

C Alfano, R Yuan, P Rebeschini - Advances in Neural …, 2023 - proceedings.neurips.cc
Modern policy optimization methods in reinforcement learning, such as TRPO and PPO, owe
their success to the use of parameterized policies. However, while theoretical guarantees …

Early-stopped neural networks are consistent

Z Ji, J Li, M Telgarsky - Advances in Neural Information …, 2021 - proceedings.neurips.cc
This work studies the behavior of shallow ReLU networks trained with the logistic loss via
gradient descent on binary classification data where the underlying data distribution is …

Mean-field multiagent reinforcement learning: A decentralized network approach

H Gu, X Guo, X Wei, R Xu - Mathematics of Operations …, 2024 - pubsonline.informs.org
One of the challenges for multiagent reinforcement learning (MARL) is designing efficient
learning algorithms for a large system in which each agent has only limited or partial …

[HTML][HTML] Provable multi-task representation learning by two-layer relu neural networks

L Collins, H Hassani, M Soltanolkotabi… - … of machine learning …, 2024 - pmc.ncbi.nlm.nih.gov
An increasingly popular machine learning paradigm is to pretrain a neural network (NN) on
many tasks offline, then adapt it to downstream tasks, often by re-training only the last linear …

Network size and size of the weights in memorization with two-layers neural networks

S Bubeck, R Eldan, YT Lee… - Advances in Neural …, 2020 - proceedings.neurips.cc
Abstract In 1988, Eric B. Baum showed that two-layers neural networks with threshold
activation function can perfectly memorize the binary labels of $ n $ points in general …

Feature selection with gradient descent on two-layer networks in low-rotation regimes

M Telgarsky - arxiv preprint arxiv:2208.02789, 2022 - arxiv.org
This work establishes low test error of gradient flow (GF) and stochastic gradient descent
(SGD) on two-layer ReLU networks with standard initialization, in three regimes where key …

[HTML][HTML] Variational temporal convolutional networks for I-FENN thermoelasticity

DW Abueidda, ME Mobasher - Computer Methods in Applied Mechanics …, 2024 - Elsevier
Abstract Machine learning (ML) has been used to solve multiphysics problems like
thermoelasticity through multi-layer perceptron (MLP) networks. However, MLPs have high …