Machine learning and the physical sciences

G Carleo, I Cirac, K Cranmer, L Daudet, M Schuld… - Reviews of Modern …, 2019 - APS
Machine learning (ML) encompasses a broad range of algorithms and modeling tools used
for a vast array of data processing tasks, which has entered most scientific disciplines in …

Statistical mechanics of deep learning

Y Bahri, J Kadmon, J Pennington… - Annual Review of …, 2020 - annualreviews.org
The recent striking success of deep neural networks in machine learning raises profound
questions about the theoretical principles underlying their success. For example, what can …

Xnor-net: Imagenet classification using binary convolutional neural networks

M Rastegari, V Ordonez, J Redmon… - European conference on …, 2016 - Springer
We propose two efficient approximations to standard convolutional neural networks: Binary-
Weight-Networks and XNOR-Networks. In Binary-Weight-Networks, the filters are …

Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1

M Courbariaux, I Hubara, D Soudry, R El-Yaniv… - arxiv preprint arxiv …, 2016 - arxiv.org
We introduce a method to train Binarized Neural Networks (BNNs)-neural networks with
binary weights and activations at run-time. At training-time the binary weights and activations …

Quantized neural networks: Training neural networks with low precision weights and activations

I Hubara, M Courbariaux, D Soudry, R El-Yaniv… - Journal of Machine …, 2018 - jmlr.org
The principal submatrix localization problem deals with recovering a K× K principal
submatrix of elevated mean µ in a large n× n symmetric matrix subject to additive standard …

Binarized neural networks

I Hubara, M Courbariaux, D Soudry… - Advances in neural …, 2016 - proceedings.neurips.cc
We introduce a method to train Binarized Neural Networks (BNNs)-neural networks with
binary weights and activations at run-time. At train-time the binary weights and activations …

Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data

GK Dziugaite, DM Roy - arxiv preprint arxiv:1703.11008, 2017 - arxiv.org
One of the defining properties of deep learning is that models are chosen to have many
more parameters than available training data. In light of this capacity for overfitting, it is …

Entropy-sgd: Biasing gradient descent into wide valleys

P Chaudhari, A Choromanska, S Soatto… - Journal of Statistical …, 2019 - iopscience.iop.org
This paper proposes a new optimization algorithm called Entropy-SGD for training deep
neural networks that is motivated by the local geometry of the energy landscape. Local …

Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks

P Chaudhari, S Soatto - 2018 Information Theory and …, 2018 - ieeexplore.ieee.org
Stochastic gradient descent (SGD) is widely believed to perform implicit regularization when
used to train deep neural networks, but the precise manner in which this occurs has thus far …

Non-vacuous generalization bounds at the imagenet scale: a PAC-bayesian compression approach

W Zhou, V Veitch, M Austern, RP Adams… - arxiv preprint arxiv …, 2018 - arxiv.org
Modern neural networks are highly overparameterized, with capacity to substantially overfit
to training data. Nevertheless, these networks often generalize well in practice. It has also …