A survey of uncertainty in deep neural networks
Over the last decade, neural networks have reached almost every field of science and
become a crucial part of various real world applications. Due to the increasing spread …
become a crucial part of various real world applications. Due to the increasing spread …
Global convergence of Langevin dynamics based algorithms for nonconvex optimization
We present a unified framework to analyze the global convergence of Langevin dynamics
based algorithms for nonconvex finite-sum optimization with $ n $ component functions. At …
based algorithms for nonconvex finite-sum optimization with $ n $ component functions. At …
Learning-rate annealing methods for deep neural networks
Deep neural networks (DNNs) have achieved great success in the last decades. DNN is
optimized using the stochastic gradient descent (SGD) with learning rate annealing that …
optimized using the stochastic gradient descent (SGD) with learning rate annealing that …
Faster convergence of stochastic gradient langevin dynamics for non-log-concave sampling
We provide a new convergence analysis of stochastic gradient Langevin dynamics (SGLD)
for sampling from a class of distributions that can be non-log-concave. At the core of our …
for sampling from a class of distributions that can be non-log-concave. At the core of our …
Accelerating approximate thompson sampling with underdamped langevin monte carlo
Abstract Approximate Thompson sampling with Langevin Monte Carlo broadens its reach
from Gaussian posterior sampling to encompass more general smooth posteriors. However …
from Gaussian posterior sampling to encompass more general smooth posteriors. However …
Fractional underdamped langevin dynamics: Retargeting sgd with momentum under heavy-tailed gradient noise
Stochastic gradient descent with momentum (SGDm) is one of the most popular optimization
algorithms in deep learning. While there is a rich theory of SGDm for convex problems, the …
algorithms in deep learning. While there is a rich theory of SGDm for convex problems, the …
Distributed learning systems with first-order methods
Scalable and efficient distributed learning is one of the main driving forces behind the recent
rapid advancement of machine learning and artificial intelligence. One prominent feature of …
rapid advancement of machine learning and artificial intelligence. One prominent feature of …
Adaptive weight decay for deep neural networks
Regularization in the optimization of deep neural networks is often critical to avoid
undesirable over-fitting leading to better generalization of model. One of the most popular …
undesirable over-fitting leading to better generalization of model. One of the most popular …
Primal dual interpretation of the proximal stochastic gradient Langevin algorithm
We consider the task of sampling with respect to a log concave probability distribution. The
potential of the target distribution is assumed to be composite, ie, written as the sum of a …
potential of the target distribution is assumed to be composite, ie, written as the sum of a …
On the convergence of Hamiltonian Monte Carlo with stochastic gradients
Abstract Hamiltonian Monte Carlo (HMC), built based on the Hamilton's equation, has been
witnessed great success in sampling from high-dimensional posterior distributions …
witnessed great success in sampling from high-dimensional posterior distributions …