Interacting Langevin diffusions: Gradient structure and ensemble Kalman sampler

A Garbuno-Inigo, F Hoffmann, W Li, AM Stuart - SIAM Journal on Applied …, 2020 - SIAM
Solving inverse problems without the use of derivatives or adjoints of the forward model is
highly desirable in many applications arising in science and engineering. In this paper we …

Achieving high accuracy with PINNs via energy natural gradient descent

J Müller, M Zeinhofer - International Conference on Machine …, 2023 - proceedings.mlr.press
We propose energy natural gradient descent, a natural gradient method with respect to a
Hessian-induced Riemannian metric as an optimization algorithm for physics-informed …

Finite-sample guarantees for Wasserstein distributionally robust optimization: Breaking the curse of dimensionality

R Gao - Operations Research, 2023 - pubsonline.informs.org
Wasserstein distributionally robust optimization (DRO) aims to find robust and generalizable
solutions by hedging against data perturbations in Wasserstein distance. Despite its recent …

On the geometry of Stein variational gradient descent

A Duncan, N Nüsken, L Szpruch - Journal of Machine Learning Research, 2023 - jmlr.org
Bayesian inference problems require sampling or approximating high-dimensional
probability distributions. The focus of this paper is on the recently introduced Stein …

Practical quasi-newton methods for training deep neural networks

D Goldfarb, Y Ren, A Bahamou - Advances in Neural …, 2020 - proceedings.neurips.cc
We consider the development of practical stochastic quasi-Newton, and in particular
Kronecker-factored block diagonal BFGS and L-BFGS methods, for training deep neural …

Sparse optimization on measures with over-parameterized gradient descent

L Chizat - Mathematical Programming, 2022 - Springer
Minimizing a convex function of a measure with a sparsity-inducing penalty is a typical
problem arising, eg, in sparse spikes deconvolution or two-layer neural networks training …

When optimal transport meets information geometry

G Khan, J Zhang - Information Geometry, 2022 - Springer
Abstract Information geometry and optimal transport are two distinct geometric frameworks
for modeling families of probability measures. During the recent years, there has been a …

Improving sequence-to-sequence learning via optimal transport

L Chen, Y Zhang, R Zhang, C Tao, Z Gan… - arxiv preprint arxiv …, 2019 - arxiv.org
Sequence-to-sequence models are commonly trained via maximum likelihood estimation
(MLE). However, standard MLE training considers a word-level objective, predicting the next …

High order spatial discretization for variational time implicit schemes: Wasserstein gradient flows and reaction-diffusion systems

G Fu, S Osher, W Li - Journal of Computational Physics, 2023 - Elsevier
We design and compute first-order implicit-in-time variational schemes with high-order
spatial discretization for initial value gradient flows in generalized optimal transport metric …

Efficient natural gradient descent methods for large-scale PDE-based optimization problems

L Nurbekyan, W Lei, Y Yang - SIAM Journal on Scientific Computing, 2023 - SIAM
We propose efficient numerical schemes for implementing the natural gradient descent
(NGD) for a broad range of metric spaces with applications to PDE-based optimization …