Gradient starvation: A learning proclivity in neural networks

M Pezeshki, O Kaba, Y Bengio… - Advances in …, 2021 - proceedings.neurips.cc
We identify and formalize a fundamental gradient descent phenomenon resulting in a
learning proclivity in over-parameterized neural networks. Gradient Starvation arises when …

An empirical study of example forgetting during deep neural network learning

M Toneva, A Sordoni, RT Combes, A Trischler… - arxiv preprint arxiv …, 2018 - arxiv.org
Inspired by the phenomenon of catastrophic forgetting, we investigate the learning dynamics
of neural networks as they train on single classification tasks. Our goal is to understand …

Correct-n-contrast: A contrastive approach for improving robustness to spurious correlations

M Zhang, NS Sohoni, HR Zhang, C Finn… - arxiv preprint arxiv …, 2022 - arxiv.org
Spurious correlations pose a major challenge for robust machine learning. Models trained
with empirical risk minimization (ERM) may learn to rely on correlations between class …

Neural redshift: Random networks are not random functions

D Teney, AM Nicolicioiu, V Hartmann… - Proceedings of the …, 2024 - openaccess.thecvf.com
Our understanding of the generalization capabilities of neural networks NNs is still
incomplete. Prevailing explanations are based on implicit biases of gradient descent GD but …

Unsupervised state representation learning in atari

A Anand, E Racah, S Ozair, Y Bengio… - Advances in neural …, 2019 - proceedings.neurips.cc
State representation learning, or the ability to capture latent generative factors of an
environment is crucial for building intelligent agents that can perform a wide variety of tasks …

On the foundations of shortcut learning

KL Hermann, H Mobahi, T Fel, MC Mozer - arxiv preprint arxiv …, 2023 - arxiv.org
Deep-learning models can extract a rich assortment of features from data. Which features a
model uses depends not only on\emph {predictivity}--how reliably a feature indicates …

Understanding learning dynamics of language models with SVCCA

N Saphra, A Lopez - arxiv preprint arxiv:1811.00225, 2018 - arxiv.org
Research has shown that neural models implicitly encode linguistic features, but there has
been no research showing\emph {how} these encodings arise as the models are trained …

Understanding visual feature reliance through the lens of complexity

T Fel, L Bethune, A Lampinen… - Advances in Neural …, 2025 - proceedings.neurips.cc
Recent studies suggest that deep learning models' inductive bias towards favoring simpler
features may be an origin of shortcut learning. Yet, there has been limited focus on …

The implicit bias of depth: How incremental learning drives generalization

D Gissin, S Shalev-Shwartz, A Daniely - arxiv preprint arxiv:1909.12051, 2019 - arxiv.org
A leading hypothesis for the surprising generalization of neural networks is that the
dynamics of gradient descent bias the model towards simple solutions, by searching through …

An investigation of critical issues in bias mitigation techniques

R Shrestha, K Kafle, C Kanan - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
A critical problem in deep learning is that systems learn inappropriate biases, resulting in
their inability to perform well on minority groups. This has led to the creation of multiple …