Interacting Langevin diffusions: Gradient structure and ensemble Kalman sampler
Solving inverse problems without the use of derivatives or adjoints of the forward model is
highly desirable in many applications arising in science and engineering. In this paper we …
highly desirable in many applications arising in science and engineering. In this paper we …
Achieving high accuracy with PINNs via energy natural gradient descent
We propose energy natural gradient descent, a natural gradient method with respect to a
Hessian-induced Riemannian metric as an optimization algorithm for physics-informed …
Hessian-induced Riemannian metric as an optimization algorithm for physics-informed …
Finite-sample guarantees for Wasserstein distributionally robust optimization: Breaking the curse of dimensionality
R Gao - Operations Research, 2023 - pubsonline.informs.org
Wasserstein distributionally robust optimization (DRO) aims to find robust and generalizable
solutions by hedging against data perturbations in Wasserstein distance. Despite its recent …
solutions by hedging against data perturbations in Wasserstein distance. Despite its recent …
On the geometry of Stein variational gradient descent
Bayesian inference problems require sampling or approximating high-dimensional
probability distributions. The focus of this paper is on the recently introduced Stein …
probability distributions. The focus of this paper is on the recently introduced Stein …
Practical quasi-newton methods for training deep neural networks
We consider the development of practical stochastic quasi-Newton, and in particular
Kronecker-factored block diagonal BFGS and L-BFGS methods, for training deep neural …
Kronecker-factored block diagonal BFGS and L-BFGS methods, for training deep neural …
Sparse optimization on measures with over-parameterized gradient descent
L Chizat - Mathematical Programming, 2022 - Springer
Minimizing a convex function of a measure with a sparsity-inducing penalty is a typical
problem arising, eg, in sparse spikes deconvolution or two-layer neural networks training …
problem arising, eg, in sparse spikes deconvolution or two-layer neural networks training …
When optimal transport meets information geometry
G Khan, J Zhang - Information Geometry, 2022 - Springer
Abstract Information geometry and optimal transport are two distinct geometric frameworks
for modeling families of probability measures. During the recent years, there has been a …
for modeling families of probability measures. During the recent years, there has been a …
Improving sequence-to-sequence learning via optimal transport
Sequence-to-sequence models are commonly trained via maximum likelihood estimation
(MLE). However, standard MLE training considers a word-level objective, predicting the next …
(MLE). However, standard MLE training considers a word-level objective, predicting the next …
High order spatial discretization for variational time implicit schemes: Wasserstein gradient flows and reaction-diffusion systems
We design and compute first-order implicit-in-time variational schemes with high-order
spatial discretization for initial value gradient flows in generalized optimal transport metric …
spatial discretization for initial value gradient flows in generalized optimal transport metric …
Efficient natural gradient descent methods for large-scale PDE-based optimization problems
We propose efficient numerical schemes for implementing the natural gradient descent
(NGD) for a broad range of metric spaces with applications to PDE-based optimization …
(NGD) for a broad range of metric spaces with applications to PDE-based optimization …