[PDF][PDF] Tensor decompositions for learning latent variable models.

A Anandkumar, R Ge, DJ Hsu, SM Kakade… - J. Mach. Learn. Res …, 2014 - jmlr.org
This work considers a computationally and statistically efficient parameter estimation method
for a wide class of latent variable models—including Gaussian mixture models, hidden …

Improper learning for non-stochastic control

M Simchowitz, K Singh… - Conference on Learning …, 2020 - proceedings.mlr.press
We consider the problem of controlling a possibly unknown linear dynamical system with
adversarial perturbations, adversarially chosen convex loss functions, and partially …

Statistical query lower bounds for robust estimation of high-dimensional gaussians and gaussian mixtures

I Diakonikolas, DM Kane… - 2017 IEEE 58th Annual …, 2017 - ieeexplore.ieee.org
We describe a general technique that yields the first Statistical Query lower bounds for a
range of fundamental high-dimensional learning problems involving Gaussian distributions …

[PDF][PDF] Multi-objective reinforcement learning using sets of pareto dominating policies

K Van Moffaert, A Nowé - The Journal of Machine Learning Research, 2014 - jmlr.org
Many real-world problems involve the optimization of multiple, possibly conflicting
objectives. Multi-objective reinforcement learning (MORL) is a generalization of standard …

No bad local minima: Data independent training error guarantees for multilayer neural networks

D Soudry, Y Carmon - arxiv preprint arxiv:1605.08361, 2016 - arxiv.org
We use smoothed analysis techniques to provide guarantees on the training loss of
Multilayer Neural Networks (MNNs) at differentiable local minima. Specifically, we examine …

Mixture models, robustness, and sum of squares proofs

SB Hopkins, J Li - Proceedings of the 50th Annual ACM SIGACT …, 2018 - dl.acm.org
We use the Sum of Squares method to develop new efficient algorithms for learning well-
separated mixtures of Gaussians and robust mean estimation, both in high dimensions, that …

How to capture higher-order correlations? generalizing matrix softmax attention to kronecker computation

J Alman, Z Song - arxiv preprint arxiv:2310.04064, 2023 - arxiv.org
In the classical transformer attention scheme, we are given three $ n\times d $ size matrices
$ Q, K, V $(the query, key, and value tokens), and the goal is to compute a new $ n\times d …

Beating the perils of non-convexity: Guaranteed training of neural networks using tensor methods

M Janzamin, H Sedghi, A Anandkumar - arxiv preprint arxiv:1506.08473, 2015 - arxiv.org
Training neural networks is a challenging non-convex optimization problem, and
backpropagation or gradient descent can get stuck in spurious local optima. We propose a …

Robust moment estimation and improved clustering via sum of squares

PK Kothari, J Steinhardt, D Steurer - … of the 50th Annual ACM SIGACT …, 2018 - dl.acm.org
We develop efficient algorithms for estimating low-degree moments of unknown distributions
in the presence of adversarial outliers and design a new family of convex relaxations for k …

List-decodable robust mean estimation and learning mixtures of spherical gaussians

I Diakonikolas, DM Kane, A Stewart - … of the 50th Annual ACM SIGACT …, 2018 - dl.acm.org
We study the problem of list-decodable (robust) Gaussian mean estimation and the related
problem of learning mixtures of separated spherical Gaussians. In the former problem, we …