Towards understanding ensemble, knowledge distillation and self-distillation in deep learning

Z Allen-Zhu, Y Li - arxiv preprint arxiv:2012.09816, 2020 - arxiv.org
We formally study how ensemble of deep learning models can improve test accuracy, and
how the superior performance of ensemble can be distilled into a single model using …

Learning and generalization in overparameterized neural networks, going beyond two layers

Z Allen-Zhu, Y Li, Y Liang - Advances in neural information …, 2019 - proceedings.neurips.cc
Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers
Page 1 Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two …

What can resnet learn efficiently, going beyond kernels?

Z Allen-Zhu, Y Li - Advances in Neural Information …, 2019 - proceedings.neurips.cc
How can neural networks such as ResNet\emph {efficiently} learn CIFAR-10 with test
accuracy more than $96\% $, while other methods, especially kernel methods, fall relatively …

Feature purification: How adversarial training performs robust deep learning

Z Allen-Zhu, Y Li - 2021 IEEE 62nd Annual Symposium on …, 2022 - ieeexplore.ieee.org
Despite the empirical success of using adversarial training to defend deep learning models
against adversarial perturbations, so far, it still remains rather unclear what the principles are …

Fl-ntk: A neural tangent kernel-based framework for federated learning analysis

B Huang, X Li, Z Song, X Yang - International Conference on …, 2021 - proceedings.mlr.press
Federated Learning (FL) is an emerging learning scheme that allows different distributed
clients to train deep neural networks together without data sharing. Neural networks have …

On the Convergence and Sample Complexity Analysis of Deep Q-Networks with -Greedy Exploration

S Zhang, H Li, M Wang, M Liu… - Advances in …, 2024 - proceedings.neurips.cc
This paper provides a theoretical understanding of deep Q-Network (DQN) with the
$\varepsilon $-greedy exploration in deep reinforcement learning. Despite the tremendous …

Learning over-parametrized two-layer neural networks beyond ntk

Y Li, T Ma, HR Zhang - Conference on learning theory, 2020 - proceedings.mlr.press
We consider the dynamic of gradient descent for learning a two-layer neural network. We
assume the input $ x\in\mathbb {R}^ d $ is drawn from a Gaussian distribution and the label …

Bounding the width of neural networks via coupled initialization a worst case analysis

A Munteanu, S Omlor, Z Song… - … on Machine Learning, 2022 - proceedings.mlr.press
A common method in training neural networks is to initialize all the weights to be
independent Gaussian vectors. We observe that by instead initializing the weights into …

Hardness of noise-free learning for two-hidden-layer neural networks

S Chen, A Gollakota, A Klivans… - Advances in Neural …, 2022 - proceedings.neurips.cc
We give superpolynomial statistical query (SQ) lower bounds for learning two-hidden-layer
ReLU networks with respect to Gaussian inputs in the standard (noise-free) model. No …