[PDF][PDF] Tensor decompositions for learning latent variable models.
This work considers a computationally and statistically efficient parameter estimation method
for a wide class of latent variable models—including Gaussian mixture models, hidden …
for a wide class of latent variable models—including Gaussian mixture models, hidden …
Improper learning for non-stochastic control
We consider the problem of controlling a possibly unknown linear dynamical system with
adversarial perturbations, adversarially chosen convex loss functions, and partially …
adversarial perturbations, adversarially chosen convex loss functions, and partially …
Statistical query lower bounds for robust estimation of high-dimensional gaussians and gaussian mixtures
We describe a general technique that yields the first Statistical Query lower bounds for a
range of fundamental high-dimensional learning problems involving Gaussian distributions …
range of fundamental high-dimensional learning problems involving Gaussian distributions …
[PDF][PDF] Multi-objective reinforcement learning using sets of pareto dominating policies
Many real-world problems involve the optimization of multiple, possibly conflicting
objectives. Multi-objective reinforcement learning (MORL) is a generalization of standard …
objectives. Multi-objective reinforcement learning (MORL) is a generalization of standard …
No bad local minima: Data independent training error guarantees for multilayer neural networks
We use smoothed analysis techniques to provide guarantees on the training loss of
Multilayer Neural Networks (MNNs) at differentiable local minima. Specifically, we examine …
Multilayer Neural Networks (MNNs) at differentiable local minima. Specifically, we examine …
Mixture models, robustness, and sum of squares proofs
We use the Sum of Squares method to develop new efficient algorithms for learning well-
separated mixtures of Gaussians and robust mean estimation, both in high dimensions, that …
separated mixtures of Gaussians and robust mean estimation, both in high dimensions, that …
How to capture higher-order correlations? generalizing matrix softmax attention to kronecker computation
In the classical transformer attention scheme, we are given three $ n\times d $ size matrices
$ Q, K, V $(the query, key, and value tokens), and the goal is to compute a new $ n\times d …
$ Q, K, V $(the query, key, and value tokens), and the goal is to compute a new $ n\times d …
Beating the perils of non-convexity: Guaranteed training of neural networks using tensor methods
Training neural networks is a challenging non-convex optimization problem, and
backpropagation or gradient descent can get stuck in spurious local optima. We propose a …
backpropagation or gradient descent can get stuck in spurious local optima. We propose a …
Robust moment estimation and improved clustering via sum of squares
We develop efficient algorithms for estimating low-degree moments of unknown distributions
in the presence of adversarial outliers and design a new family of convex relaxations for k …
in the presence of adversarial outliers and design a new family of convex relaxations for k …
List-decodable robust mean estimation and learning mixtures of spherical gaussians
We study the problem of list-decodable (robust) Gaussian mean estimation and the related
problem of learning mixtures of separated spherical Gaussians. In the former problem, we …
problem of learning mixtures of separated spherical Gaussians. In the former problem, we …