Google Učenjak

A mean field analysis of deep resnet and beyond: Towards provably optimization via overparameteri...

Y Yang, J Wang - arxiv preprint arxiv:2011.00583, 2020 - arxiv.org

Following the remarkable success of the AlphaGO series, 2019 was a booming year that
witnessed significant advances in multi-agent reinforcement learning (MARL) techniques …

Shrani Navedi Navedeno v 352 virih Sorodni članki Vse različice: 2 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Scan and snap: Understanding training dynamics and token composition in 1-layer transformer

Y Tian, Y Wang, B Chen, SS Du - Advances in neural …, 2023 - proceedings.neurips.cc

Transformer architecture has shown impressive performance in multiple research domains
and has become the backbone of many neural network models. However, there is limited …

Shrani Navedi Navedeno v 83 virih Sorodni članki Vse različice: 11 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The modern mathematics of deep learning

J Berner, P Grohs, G Kutyniok… - arxiv preprint arxiv …, 2021 - cambridge.org

We describe the new field of the mathematical analysis of deep learning. This field emerged
around a list of research questions that were not answered within the classical framework of …

Shrani Navedi Navedeno v 228 virih Sorodni članki Vse različice: 12

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Joma: Demystifying multilayer transformers via joint dynamics of mlp and attention

Y Tian, Y Wang, Z Zhang, B Chen, S Du - arxiv preprint arxiv:2310.00535, 2023 - arxiv.org

We propose Joint MLP/Attention (JoMA) dynamics, a novel mathematical framework to
understand the training procedure of multilayer Transformer architectures. This is achieved …

Shrani Navedi Navedeno v 54 virih Sorodni članki Vse različice: 6 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Deep generalized schrödinger bridge

GH Liu, T Chen, O So… - Advances in Neural …, 2022 - proceedings.neurips.cc

Abstract Mean-Field Game (MFG) serves as a crucial mathematical framework in modeling
the collective behavior of individual agents interacting stochastically with a large population …

Shrani Navedi Navedeno v 66 virih Sorodni članki Vse različice: 6 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Zico: Zero-shot nas via inverse coefficient of variation on gradients

G Li, Y Yang, K Bhardwaj, R Marculescu - arxiv preprint arxiv:2301.11300, 2023 - arxiv.org

Neural Architecture Search (NAS) is widely used to automatically obtain the neural network
with the best performance among a large number of candidate architectures. To reduce the …

Shrani Navedi Navedeno v 68 virih Sorodni članki Vse različice: 5 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] sciencedirect.com

Neural network approximation: Three hidden layers are enough

Z Shen, H Yang, S Zhang - Neural Networks, 2021 - Elsevier

A three-hidden-layer neural network with super approximation power is introduced. This
network is built with the floor function (⌊ x⌋), the exponential function (2 x), the step function …

Shrani Navedi Navedeno v 134 virih Sorodni članki Vse različice: 13

[Free GPT-4]
[DeepSeek]

[PDF] ems.press

A rigorous framework for the mean field limit of multilayer neural networks

PM Nguyen, HT Pham - Mathematical Statistics and Learning, 2023 - ems.press

We develop a mathematically rigorous framework for multilayer neural networks in the mean
field regime. As the network's widths increase, the network's learning trajectory is shown to …

Shrani Navedi Navedeno v 113 virih Sorodni članki Vse različice: 5

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Two-layer neural networks for partial differential equations: Optimization and generalization theory

T Luo, H Yang - arxiv preprint arxiv:2006.15733, 2020 - arxiv.org

The problem of solving partial differential equations (PDEs) can be formulated into a least-
squares minimization problem, where neural networks are used to parametrize PDE …

Shrani Navedi Navedeno v 94 virih Sorodni članki Vse različice: 5 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Over-parameterization exponentially slows down gradient descent for learning a single neuron

W Xu, S Du - The Thirty Sixth Annual Conference on …, 2023 - proceedings.mlr.press

We revisit the canonical problem of learning a single neuron with ReLU activation under
Gaussian input with square loss. We particularly focus on the over-parameterization setting …

Shrani Navedi Navedeno v 24 virih Sorodni članki Vse različice: 4 V obliki HTML

Ustvari opozorilo

Navedi

Napredno iskanje

Shranjeno v Mojo knjižnico

A mean field analysis of deep resnet and beyond: Towards provably optimization via overparameteri...

An overview of multi-agent reinforcement learning from game theoretical perspective

Scan and snap: Understanding training dynamics and token composition in 1-layer transformer

The modern mathematics of deep learning

Joma: Demystifying multilayer transformers via joint dynamics of mlp and attention

Deep generalized schrödinger bridge

Zico: Zero-shot nas via inverse coefficient of variation on gradients

Neural network approximation: Three hidden layers are enough

A rigorous framework for the mean field limit of multilayer neural networks

Two-layer neural networks for partial differential equations: Optimization and generalization theory

Over-parameterization exponentially slows down gradient descent for learning a single neuron