Do bayesian neural networks need to be fully stochastic?
We investigate the benefit of treating all the parameters in a Bayesian neural network
stochastically and find compelling theoretical and empirical evidence that this standard …
stochastically and find compelling theoretical and empirical evidence that this standard …
Separation of scales and a thermodynamic description of feature learning in some cnns
Deep neural networks (DNNs) are powerful tools for compressing and distilling information.
Their scale and complexity, often involving billions of inter-dependent parameters, render …
Their scale and complexity, often involving billions of inter-dependent parameters, render …
Sam as an optimal relaxation of bayes
Sharpness-aware minimization (SAM) and related adversarial deep-learning methods can
drastically improve generalization, but their underlying mechanisms are not yet fully …
drastically improve generalization, but their underlying mechanisms are not yet fully …
On the detrimental effect of invariances in the likelihood for variational inference
Variational Bayesian posterior inference often requires simplifying approximations such as
mean-field parametrisation to ensure tractability. However, prior work has associated the …
mean-field parametrisation to ensure tractability. However, prior work has associated the …
Variational learning is effective for large deep networks
We give extensive empirical evidence against the common belief that variational learning is
ineffective for large neural networks. We show that an optimizer called Improved Variational …
ineffective for large neural networks. We show that an optimizer called Improved Variational …
Markov chain score ascent: A unifying framework of variational inference with markovian gradients
Abstract Minimizing the inclusive Kullback-Leibler (KL) divergence with stochastic gradient
descent (SGD) is challenging since its gradient is defined as an integral over the posterior …
descent (SGD) is challenging since its gradient is defined as an integral over the posterior …
Sparse MoEs meet efficient ensembles
Machine learning models based on the aggregated outputs of submodels, either at the
activation or prediction levels, often exhibit strong performance compared to individual …
activation or prediction levels, often exhibit strong performance compared to individual …
Streamlining Prediction in Bayesian Deep Learning
The rising interest in Bayesian deep learning (BDL) has led to a plethora of methods for
estimating the posterior distribution. However, efficient computation of inferences, such as …
estimating the posterior distribution. However, efficient computation of inferences, such as …
Law of large numbers for bayesian two-layer neural network trained with variational inference
We provide a rigorous analysis of training by variational inference (VI) of Bayesian neural
networks in the two-layer and infinite-width case. We consider a regression problem with a …
networks in the two-layer and infinite-width case. We consider a regression problem with a …
On the disconnect between theory and practice of overparametrized neural networks
The infinite-width limit of neural networks (NNs) has garnered significant attention as a
theoretical framework for analyzing the behavior of large-scale, overparametrized networks …
theoretical framework for analyzing the behavior of large-scale, overparametrized networks …