A selective overview of deep learning

J Fan, C Ma, Y Zhong - Statistical science: a review journal of …, 2020 - pmc.ncbi.nlm.nih.gov
Deep learning has achieved tremendous success in recent years. In simple words, deep
learning uses the composition of many nonlinear functions to model the complex …

[PDF][PDF] The computational limits of deep learning

NC Thompson, K Greenewald, K Lee… - arxiv preprint arxiv …, 2020 - assets.pubpub.org
Deep learning's recent history has been one of achievement: from triumphing over humans
in the game of Go to world-leading performance in image classification, voice recognition …

Learning imbalanced datasets with label-distribution-aware margin loss

K Cao, C Wei, A Gaidon… - Advances in neural …, 2019 - proceedings.neurips.cc
Deep learning algorithms can fare poorly when the training dataset suffers from heavy class-
imbalance but the testing criterion requires good generalization on less frequent classes …

Fantastic generalization measures and where to find them

Y Jiang, B Neyshabur, H Mobahi, D Krishnan… - arxiv preprint arxiv …, 2019 - arxiv.org
Generalization of deep networks has been of great interest in recent years, resulting in a
number of theoretically and empirically motivated complexity measures. However, most …

Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks

S Arora, S Du, W Hu, Z Li… - … Conference on Machine …, 2019 - proceedings.mlr.press
Recent works have cast some light on the mystery of why deep nets fit any data and
generalize despite being very overparametrized. This paper analyzes training and …

A theoretical analysis of deep Q-learning

J Fan, Z Wang, Y **e, Z Yang - Learning for dynamics and …, 2020 - proceedings.mlr.press
Despite the great empirical success of deep reinforcement learning, its theoretical
foundation is less well understood. In this work, we make the first attempt to theoretically …

The pitfalls of simplicity bias in neural networks

H Shah, K Tamuly, A Raghunathan… - Advances in …, 2020 - proceedings.neurips.cc
Several works have proposed Simplicity Bias (SB)---the tendency of standard training
procedures such as Stochastic Gradient Descent (SGD) to find simple models---to justify why …

Gradient descent optimizes over-parameterized deep ReLU networks

D Zou, Y Cao, D Zhou, Q Gu - Machine learning, 2020 - Springer
We study the problem of training deep fully connected neural networks with Rectified Linear
Unit (ReLU) activation function and cross entropy loss function for binary classification using …

Frequency principle: Fourier analysis sheds light on deep neural networks

ZQJ Xu, Y Zhang, T Luo, Y **ao, Z Ma - arxiv preprint arxiv:1901.06523, 2019 - arxiv.org
We study the training process of Deep Neural Networks (DNNs) from the Fourier analysis
perspective. We demonstrate a very universal Frequency Principle (F-Principle)---DNNs …

Generalization bounds of stochastic gradient descent for wide and deep neural networks

Y Cao, Q Gu - Advances in neural information processing …, 2019 - proceedings.neurips.cc
We study the training and generalization of deep neural networks (DNNs) in the over-
parameterized regime, where the network width (ie, number of hidden nodes per layer) is …