Towards understanding ensemble, knowledge distillation and self-distillation in deep learning
We formally study how ensemble of deep learning models can improve test accuracy, and
how the superior performance of ensemble can be distilled into a single model using …
how the superior performance of ensemble can be distilled into a single model using …
Learning and generalization in overparameterized neural networks, going beyond two layers
Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers
Page 1 Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two …
Page 1 Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two …
What can resnet learn efficiently, going beyond kernels?
How can neural networks such as ResNet\emph {efficiently} learn CIFAR-10 with test
accuracy more than $96\% $, while other methods, especially kernel methods, fall relatively …
accuracy more than $96\% $, while other methods, especially kernel methods, fall relatively …
Feature purification: How adversarial training performs robust deep learning
Despite the empirical success of using adversarial training to defend deep learning models
against adversarial perturbations, so far, it still remains rather unclear what the principles are …
against adversarial perturbations, so far, it still remains rather unclear what the principles are …
Fl-ntk: A neural tangent kernel-based framework for federated learning analysis
Federated Learning (FL) is an emerging learning scheme that allows different distributed
clients to train deep neural networks together without data sharing. Neural networks have …
clients to train deep neural networks together without data sharing. Neural networks have …
On the Convergence and Sample Complexity Analysis of Deep Q-Networks with -Greedy Exploration
This paper provides a theoretical understanding of deep Q-Network (DQN) with the
$\varepsilon $-greedy exploration in deep reinforcement learning. Despite the tremendous …
$\varepsilon $-greedy exploration in deep reinforcement learning. Despite the tremendous …
Learning over-parametrized two-layer neural networks beyond ntk
We consider the dynamic of gradient descent for learning a two-layer neural network. We
assume the input $ x\in\mathbb {R}^ d $ is drawn from a Gaussian distribution and the label …
assume the input $ x\in\mathbb {R}^ d $ is drawn from a Gaussian distribution and the label …
Bounding the width of neural networks via coupled initialization a worst case analysis
A common method in training neural networks is to initialize all the weights to be
independent Gaussian vectors. We observe that by instead initializing the weights into …
independent Gaussian vectors. We observe that by instead initializing the weights into …
Hardness of noise-free learning for two-hidden-layer neural networks
We give superpolynomial statistical query (SQ) lower bounds for learning two-hidden-layer
ReLU networks with respect to Gaussian inputs in the standard (noise-free) model. No …
ReLU networks with respect to Gaussian inputs in the standard (noise-free) model. No …