To compress or not to compress—self-supervised learning and information theory: A review

R Shwartz Ziv, Y LeCun - Entropy, 2024 - mdpi.com
Deep neural networks excel in supervised learning tasks but are constrained by the need for
extensive labeled data. Self-supervised learning emerges as a promising alternative …

Generalization bounds: Perspectives from information theory and PAC-Bayes

F Hellström, G Durisi, B Guedj… - … and Trends® in …, 2025 - nowpublishers.com
A fundamental question in theoretical machine learning is generalization. Over the past
decades, the PAC-Bayesian approach has been established as a flexible framework to …

Control batch size and learning rate to generalize well: Theoretical and empirical evidence

F He, T Liu, D Tao - Advances in neural information …, 2019 - proceedings.neurips.cc
Deep neural networks have received dramatic success based on the optimization method of
stochastic gradient descent (SGD). However, it is still not clear how to tune hyper …

Recent advances in deep learning theory

F He, D Tao - arxiv preprint arxiv:2012.10931, 2020 - arxiv.org
Deep learning is usually described as an experiment-driven field under continuous criticizes
of lacking theoretical foundations. This problem has been partially fixed by a large volume of …

On the power of over-parametrization in neural networks with quadratic activation

S Du, J Lee - International conference on machine learning, 2018 - proceedings.mlr.press
We provide new theoretical insights on why over-parametrization is effective in learning
neural networks. For a $ k $ hidden node shallow network with quadratic activation and $ n …

Tightening mutual information-based bounds on generalization error

Y Bu, S Zou, VV Veeravalli - IEEE Journal on Selected Areas in …, 2020 - ieeexplore.ieee.org
An information-theoretic upper bound on the generalization error of supervised learning
algorithms is derived. The bound is constructed in terms of the mutual information between …

Information-theoretic generalization bounds for SGLD via data-dependent estimates

J Negrea, M Haghifam, GK Dziugaite… - Advances in …, 2019 - proceedings.neurips.cc
In this work, we improve upon the stepwise analysis of noisy iterative learning algorithms
initiated by Pensia, Jog, and Loh (2018) and recently extended by Bu, Zou, and Veeravalli …

Sharpened generalization bounds based on conditional mutual information and an application to noisy, iterative algorithms

M Haghifam, J Negrea, A Khisti… - Advances in …, 2020 - proceedings.neurips.cc
The information-theoretic framework of Russo and Zou (2016) and Xu and Raginsky (2017)
provides bounds on the generalization error of a learning algorithm in terms of the mutual …

Information-theoretic generalization bounds for stochastic gradient descent

G Neu, GK Dziugaite, M Haghifam… - … on Learning Theory, 2021 - proceedings.mlr.press
We study the generalization properties of the popular stochastic optimization method known
as stochastic gradient descent (SGD) for optimizing general non-convex loss functions. Our …

Topological generalization bounds for discrete-time stochastic optimization algorithms

R Andreeva, B Dupuis, R Sarkar… - Advances in Neural …, 2025 - proceedings.neurips.cc
We present a novel set of rigorous and computationally efficient topology-based complexity
notions that exhibit a strong correlation with the generalization gap in modern deep neural …