An overview of multi-agent reinforcement learning from game theoretical perspective

Y Yang, J Wang - arxiv preprint arxiv:2011.00583, 2020 - arxiv.org
Following the remarkable success of the AlphaGO series, 2019 was a booming year that
witnessed significant advances in multi-agent reinforcement learning (MARL) techniques …

Scan and snap: Understanding training dynamics and token composition in 1-layer transformer

Y Tian, Y Wang, B Chen, SS Du - Advances in neural …, 2023 - proceedings.neurips.cc
Transformer architecture has shown impressive performance in multiple research domains
and has become the backbone of many neural network models. However, there is limited …

The modern mathematics of deep learning

J Berner, P Grohs, G Kutyniok… - arxiv preprint arxiv …, 2021 - cambridge.org
We describe the new field of the mathematical analysis of deep learning. This field emerged
around a list of research questions that were not answered within the classical framework of …

Joma: Demystifying multilayer transformers via joint dynamics of mlp and attention

Y Tian, Y Wang, Z Zhang, B Chen, S Du - arxiv preprint arxiv:2310.00535, 2023 - arxiv.org
We propose Joint MLP/Attention (JoMA) dynamics, a novel mathematical framework to
understand the training procedure of multilayer Transformer architectures. This is achieved …

Deep generalized schrödinger bridge

GH Liu, T Chen, O So… - Advances in Neural …, 2022 - proceedings.neurips.cc
Abstract Mean-Field Game (MFG) serves as a crucial mathematical framework in modeling
the collective behavior of individual agents interacting stochastically with a large population …

Zico: Zero-shot nas via inverse coefficient of variation on gradients

G Li, Y Yang, K Bhardwaj, R Marculescu - arxiv preprint arxiv:2301.11300, 2023 - arxiv.org
Neural Architecture Search (NAS) is widely used to automatically obtain the neural network
with the best performance among a large number of candidate architectures. To reduce the …

Neural network approximation: Three hidden layers are enough

Z Shen, H Yang, S Zhang - Neural Networks, 2021 - Elsevier
A three-hidden-layer neural network with super approximation power is introduced. This
network is built with the floor function (⌊ x⌋), the exponential function (2 x), the step function …

A rigorous framework for the mean field limit of multilayer neural networks

PM Nguyen, HT Pham - Mathematical Statistics and Learning, 2023 - ems.press
We develop a mathematically rigorous framework for multilayer neural networks in the mean
field regime. As the network's widths increase, the network's learning trajectory is shown to …

Two-layer neural networks for partial differential equations: Optimization and generalization theory

T Luo, H Yang - arxiv preprint arxiv:2006.15733, 2020 - arxiv.org
The problem of solving partial differential equations (PDEs) can be formulated into a least-
squares minimization problem, where neural networks are used to parametrize PDE …

Over-parameterization exponentially slows down gradient descent for learning a single neuron

W Xu, S Du - The Thirty Sixth Annual Conference on …, 2023 - proceedings.mlr.press
We revisit the canonical problem of learning a single neuron with ReLU activation under
Gaussian input with square loss. We particularly focus on the over-parameterization setting …