Do bayesian neural networks need to be fully stochastic?

M Sharma, S Farquhar, E Nalisnick… - International …, 2023 - proceedings.mlr.press
We investigate the benefit of treating all the parameters in a Bayesian neural network
stochastically and find compelling theoretical and empirical evidence that this standard …

Separation of scales and a thermodynamic description of feature learning in some cnns

I Seroussi, G Naveh, Z Ringel - Nature Communications, 2023 - nature.com
Deep neural networks (DNNs) are powerful tools for compressing and distilling information.
Their scale and complexity, often involving billions of inter-dependent parameters, render …

Sam as an optimal relaxation of bayes

T Möllenhoff, ME Khan - arxiv preprint arxiv:2210.01620, 2022 - arxiv.org
Sharpness-aware minimization (SAM) and related adversarial deep-learning methods can
drastically improve generalization, but their underlying mechanisms are not yet fully …

On the detrimental effect of invariances in the likelihood for variational inference

R Kurle, R Herbrich, T Januschowski… - Advances in …, 2022 - proceedings.neurips.cc
Variational Bayesian posterior inference often requires simplifying approximations such as
mean-field parametrisation to ensure tractability. However, prior work has associated the …

Variational learning is effective for large deep networks

Y Shen, N Daheim, B Cong, P Nickl… - arxiv preprint arxiv …, 2024 - arxiv.org
We give extensive empirical evidence against the common belief that variational learning is
ineffective for large neural networks. We show that an optimizer called Improved Variational …

Markov chain score ascent: A unifying framework of variational inference with markovian gradients

K Kim, J Oh, J Gardner, AB Dieng… - Advances in Neural …, 2022 - proceedings.neurips.cc
Abstract Minimizing the inclusive Kullback-Leibler (KL) divergence with stochastic gradient
descent (SGD) is challenging since its gradient is defined as an integral over the posterior …

Sparse MoEs meet efficient ensembles

JU Allingham, F Wenzel, ZE Mariet, B Mustafa… - arxiv preprint arxiv …, 2021 - arxiv.org
Machine learning models based on the aggregated outputs of submodels, either at the
activation or prediction levels, often exhibit strong performance compared to individual …

Streamlining Prediction in Bayesian Deep Learning

R Li, M Klasson, A Solin, M Trapp - arxiv preprint arxiv:2411.18425, 2024 - arxiv.org
The rising interest in Bayesian deep learning (BDL) has led to a plethora of methods for
estimating the posterior distribution. However, efficient computation of inferences, such as …

Law of large numbers for bayesian two-layer neural network trained with variational inference

A Descours, T Huix, A Guillin, M Michel… - The Thirty Sixth …, 2023 - proceedings.mlr.press
We provide a rigorous analysis of training by variational inference (VI) of Bayesian neural
networks in the two-layer and infinite-width case. We consider a regression problem with a …

On the disconnect between theory and practice of overparametrized neural networks

J Wenger, F Dangel, A Kristiadi - arxiv preprint arxiv:2310.00137, 2023 - arxiv.org
The infinite-width limit of neural networks (NNs) has garnered significant attention as a
theoretical framework for analyzing the behavior of large-scale, overparametrized networks …