Inductive biases for deep learning of higher-level cognition
A fascinating hypothesis is that human and animal intelligence could be explained by a few
principles (rather than an encyclopaedic list of heuristics). If that hypothesis was correct, we …
principles (rather than an encyclopaedic list of heuristics). If that hypothesis was correct, we …
Deep generative models in inversion: The impact of the generator's nonlinearity and development of a new approach based on a variational autoencoder
When solving inverse problems in geophysical imaging, deep generative models (DGMs)
may be used to enforce the solution to display highly structured spatial patterns which are …
may be used to enforce the solution to display highly structured spatial patterns which are …
Bayesian deep learning and a probabilistic perspective of generalization
The key distinguishing property of a Bayesian approach is marginalization, rather than using
a single setting of weights. Bayesian marginalization can particularly improve the accuracy …
a single setting of weights. Bayesian marginalization can particularly improve the accuracy …
Towards efficient and scalable sharpness-aware minimization
Abstract Recently, Sharpness-Aware Minimization (SAM), which connects the geometry of
the loss landscape and generalization, has demonstrated a significant performance boost …
the loss landscape and generalization, has demonstrated a significant performance boost …
Fantastic generalization measures and where to find them
Generalization of deep networks has been of great interest in recent years, resulting in a
number of theoretically and empirically motivated complexity measures. However, most …
number of theoretically and empirically motivated complexity measures. However, most …
Linear mode connectivity and the lottery ticket hypothesis
We study whether a neural network optimizes to the same, linearly connected minimum
under different samples of SGD noise (eg, random data order and augmentation). We find …
under different samples of SGD noise (eg, random data order and augmentation). We find …
Revisiting small batch training for deep neural networks
D Masters, C Luschi - arxiv preprint arxiv:1804.07612, 2018 - arxiv.org
Modern deep neural network training is typically based on mini-batch stochastic gradient
optimization. While the use of large mini-batches increases the available computational …
optimization. While the use of large mini-batches increases the available computational …
Wireless network intelligence at the edge
Fueled by the availability of more data and computing power, recent breakthroughs in cloud-
based machine learning (ML) have transformed every aspect of our lives from face …
based machine learning (ML) have transformed every aspect of our lives from face …
signSGD: Compressed optimisation for non-convex problems
Training large neural networks requires distributing learning across multiple workers, where
the cost of communicating gradients can be a significant bottleneck. signSGD alleviates this …
the cost of communicating gradients can be a significant bottleneck. signSGD alleviates this …
Super-convergence: Very fast training of neural networks using large learning rates
In this paper, we describe a phenomenon, which we named “super-convergence”, where
neural networks can be trained an order of magnitude faster than with standard training …
neural networks can be trained an order of magnitude faster than with standard training …