Inductive biases for deep learning of higher-level cognition

A Goyal, Y Bengio - Proceedings of the Royal Society A, 2022 - royalsocietypublishing.org
A fascinating hypothesis is that human and animal intelligence could be explained by a few
principles (rather than an encyclopaedic list of heuristics). If that hypothesis was correct, we …

Deep generative models in inversion: The impact of the generator's nonlinearity and development of a new approach based on a variational autoencoder

J Lopez-Alvis, E Laloy, F Nguyen, T Hermans - Computers & Geosciences, 2021 - Elsevier
When solving inverse problems in geophysical imaging, deep generative models (DGMs)
may be used to enforce the solution to display highly structured spatial patterns which are …

Bayesian deep learning and a probabilistic perspective of generalization

AG Wilson, P Izmailov - Advances in neural information …, 2020 - proceedings.neurips.cc
The key distinguishing property of a Bayesian approach is marginalization, rather than using
a single setting of weights. Bayesian marginalization can particularly improve the accuracy …

Towards efficient and scalable sharpness-aware minimization

Y Liu, S Mai, X Chen, CJ Hsieh… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Abstract Recently, Sharpness-Aware Minimization (SAM), which connects the geometry of
the loss landscape and generalization, has demonstrated a significant performance boost …

Fantastic generalization measures and where to find them

Y Jiang, B Neyshabur, H Mobahi, D Krishnan… - arxiv preprint arxiv …, 2019 - arxiv.org
Generalization of deep networks has been of great interest in recent years, resulting in a
number of theoretically and empirically motivated complexity measures. However, most …

Linear mode connectivity and the lottery ticket hypothesis

J Frankle, GK Dziugaite, D Roy… - … on Machine Learning, 2020 - proceedings.mlr.press
We study whether a neural network optimizes to the same, linearly connected minimum
under different samples of SGD noise (eg, random data order and augmentation). We find …

Revisiting small batch training for deep neural networks

D Masters, C Luschi - arxiv preprint arxiv:1804.07612, 2018 - arxiv.org
Modern deep neural network training is typically based on mini-batch stochastic gradient
optimization. While the use of large mini-batches increases the available computational …

Wireless network intelligence at the edge

J Park, S Samarakoon, M Bennis… - Proceedings of the …, 2019 - ieeexplore.ieee.org
Fueled by the availability of more data and computing power, recent breakthroughs in cloud-
based machine learning (ML) have transformed every aspect of our lives from face …

signSGD: Compressed optimisation for non-convex problems

J Bernstein, YX Wang… - International …, 2018 - proceedings.mlr.press
Training large neural networks requires distributing learning across multiple workers, where
the cost of communicating gradients can be a significant bottleneck. signSGD alleviates this …

Super-convergence: Very fast training of neural networks using large learning rates

LN Smith, N Topin - … and machine learning for multi-domain …, 2019 - spiedigitallibrary.org
In this paper, we describe a phenomenon, which we named “super-convergence”, where
neural networks can be trained an order of magnitude faster than with standard training …