Fishr: Invariant gradient variances for out-of-distribution generalization
Learning robust models that generalize well under changes in the data distribution is critical
for real-world applications. To this end, there has been a growing surge of interest to learn …
for real-world applications. To this end, there has been a growing surge of interest to learn …
Estimating example difficulty using variance of gradients
In machine learning, a question of great interest is understanding what examples are
challenging for a model to classify. Identifying atypical examples ensures the safe …
challenging for a model to classify. Identifying atypical examples ensures the safe …
An experimental study of byzantine-robust aggregation schemes in federated learning
Byzantine-robust federated learning aims at mitigating Byzantine failures during the
federated training process, where malicious participants (known as Byzantine clients) may …
federated training process, where malicious participants (known as Byzantine clients) may …
Embarrassingly simple dataset distillation
Training of large-scale models in general requires enormous amounts of traning data.
Dataset distillation aims to extract a small set of synthetic training samples from a large …
Dataset distillation aims to extract a small set of synthetic training samples from a large …
On the limitations of compute thresholds as a governance strategy
S Hooker - arxiv preprint arxiv:2407.05694, 2024 - arxiv.org
At face value, this essay is about understanding a fairly esoteric governance tool called
compute thresholds. However, in order to grapple with whether these thresholds will achieve …
compute thresholds. However, in order to grapple with whether these thresholds will achieve …
On the generalization of models trained with SGD: Information-theoretic bounds and implications
This paper follows up on a recent work of Neu et al.(2021) and presents some new
information-theoretic upper bounds for the generalization error of machine learning models …
information-theoretic upper bounds for the generalization error of machine learning models …
A tale of two long tails
As machine learning models are increasingly employed to assist human decision-makers, it
becomes critical to communicate the uncertainty associated with these model predictions …
becomes critical to communicate the uncertainty associated with these model predictions …
[HTML][HTML] Low-variance Forward Gradients using Direct Feedback Alignment and momentum
F Bacho, D Chu - Neural Networks, 2024 - Elsevier
Supervised learning in deep neural networks is commonly performed using error
backpropagation. However, the sequential propagation of errors during the backward pass …
backpropagation. However, the sequential propagation of errors during the backward pass …
Jointly-learnt exit and inference for dynamic neural networks
J Chataoui - 2024 - escholarship.mcgill.ca
Les réseaux neuronaux artificiels dynamiques à sortie anticipée (RNDSA) ont pour but de
réduire le coût des prédictions en sautant les couches les plus profondes du réseau pour …
réduire le coût des prédictions en sautant les couches les plus profondes du réseau pour …
Simigrad: Fine-grained adaptive batching for large scale training using gradient similarity measurement
Large scale training requires massive parallelism to finish the training within a reasonable
amount of time. To support massive parallelism, large batch training is the key enabler but …
amount of time. To support massive parallelism, large batch training is the key enabler but …