The benefits of mixup for feature learning

D Zou, Y Cao, Y Li, Q Gu - International Conference on …, 2023 - proceedings.mlr.press
Mixup, a simple data augmentation method that randomly mixes two data points via linear
interpolation, has been extensively applied in various deep learning applications to gain …

Initialization-dependent sample complexity of linear predictors and neural networks

R Magen, O Shamir - Advances in Neural Information …, 2023 - proceedings.neurips.cc
We provide several new results on the sample complexity of vector-valued linear predictors
(parameterized by a matrix), and more generally neural networks. Focusing on size …

Lower generalization bounds for gd and sgd in smooth stochastic convex optimization

P Zhang, J Teng, J Zhang - arxiv preprint arxiv:2303.10758, 2023 - arxiv.org
This work studies the generalization error of gradient methods. More specifically, we focus
on how training steps $ T $ and step-size $\eta $ might affect generalization in smooth …

Implicit regularization of AdaDelta

M Englert, R Lazic, A Semler - Transactions on Machine …, 2024 - wrap.warwick.ac.uk
We consider the AdaDelta adaptive optimization algorithm on locally Lipschitz, positively
homogeneous, and o-minimally definable neural networks, with either the exponential or the …

[หนังสือ][B] Feature Learning in Neural Networks and Other Stochastic Explorations

M Glasgow - 2024 - search.proquest.com
Recent years have empirically demonstrated the unprecedented success of deep learning.
Yet our theoretical understanding of why gradient descent succeeds in training neural …