Fishr: Invariant gradient variances for out-of-distribution generalization

A Rame, C Dancette, M Cord - International Conference on …, 2022 - proceedings.mlr.press
Learning robust models that generalize well under changes in the data distribution is critical
for real-world applications. To this end, there has been a growing surge of interest to learn …

Estimating example difficulty using variance of gradients

C Agarwal, D D'souza… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
In machine learning, a question of great interest is understanding what examples are
challenging for a model to classify. Identifying atypical examples ensures the safe …

An experimental study of byzantine-robust aggregation schemes in federated learning

S Li, ECH Ngai, T Voigt - IEEE Transactions on Big Data, 2023 - ieeexplore.ieee.org
Byzantine-robust federated learning aims at mitigating Byzantine failures during the
federated training process, where malicious participants (known as Byzantine clients) may …

Embarrassingly simple dataset distillation

Y Feng, S Vedantam, J Kempe - 2023 - par.nsf.gov
Training of large-scale models in general requires enormous amounts of traning data.
Dataset distillation aims to extract a small set of synthetic training samples from a large …

On the limitations of compute thresholds as a governance strategy

S Hooker - arxiv preprint arxiv:2407.05694, 2024 - arxiv.org
At face value, this essay is about understanding a fairly esoteric governance tool called
compute thresholds. However, in order to grapple with whether these thresholds will achieve …

On the generalization of models trained with SGD: Information-theoretic bounds and implications

Z Wang, Y Mao - arxiv preprint arxiv:2110.03128, 2021 - arxiv.org
This paper follows up on a recent work of Neu et al.(2021) and presents some new
information-theoretic upper bounds for the generalization error of machine learning models …

A tale of two long tails

D D'souza, Z Nussbaum, C Agarwal… - arxiv preprint arxiv …, 2021 - arxiv.org
As machine learning models are increasingly employed to assist human decision-makers, it
becomes critical to communicate the uncertainty associated with these model predictions …

[HTML][HTML] Low-variance Forward Gradients using Direct Feedback Alignment and momentum

F Bacho, D Chu - Neural Networks, 2024 - Elsevier
Supervised learning in deep neural networks is commonly performed using error
backpropagation. However, the sequential propagation of errors during the backward pass …

Jointly-learnt exit and inference for dynamic neural networks

J Chataoui - 2024 - escholarship.mcgill.ca
Les réseaux neuronaux artificiels dynamiques à sortie anticipée (RNDSA) ont pour but de
réduire le coût des prédictions en sautant les couches les plus profondes du réseau pour …

Simigrad: Fine-grained adaptive batching for large scale training using gradient similarity measurement

H Qin, S Rajbhandari, O Ruwase… - Advances in Neural …, 2021 - proceedings.neurips.cc
Large scale training requires massive parallelism to finish the training within a reasonable
amount of time. To support massive parallelism, large batch training is the key enabler but …