Hidden progress in deep learning: Sgd learns parities near the computational limit
There is mounting evidence of emergent phenomena in the capabilities of deep learning
methods as we scale up datasets, model sizes, and training times. While there are some …
methods as we scale up datasets, model sizes, and training times. While there are some …
Statistical query lower bounds for robust estimation of high-dimensional gaussians and gaussian mixtures
We describe a general technique that yields the first Statistical Query lower bounds for a
range of fundamental high-dimensional learning problems involving Gaussian distributions …
range of fundamental high-dimensional learning problems involving Gaussian distributions …
Exploring connections between active learning and model extraction
Machine learning is being increasingly used by individuals, research institutions, and
corporations. This has resulted in the surge of Machine Learning-as-a-Service (MLaaS) …
corporations. This has resulted in the surge of Machine Learning-as-a-Service (MLaaS) …
Limitations of lazy training of two-layers neural network
We study the supervised learning problem under either of the following two models:(1)
Feature vectors xi are d-dimensional Gaussian and responses are yi= f*(xi) for f* an …
Feature vectors xi are d-dimensional Gaussian and responses are yi= f*(xi) for f* an …
Near-optimal cryptographic hardness of agnostically learning halfspaces and relu regression under gaussian marginals
We study the task of agnostically learning halfspaces under the Gaussian distribution.
Specifically, given labeled examples $(\\mathbf {x}, y) $ from an unknown distribution on …
Specifically, given labeled examples $(\\mathbf {x}, y) $ from an unknown distribution on …
Reliably learning the relu in polynomial time
We give the first dimension-efficient algorithms for learning Rectified Linear Units (ReLUs),
which are functions of the form $\mathbf {x}\mapsto\mathsf {max}(0,\mathbf {w}⋅\mathbf {x}) …
which are functions of the form $\mathbf {x}\mapsto\mathsf {max}(0,\mathbf {w}⋅\mathbf {x}) …
Superpolynomial lower bounds for learning one-layer neural networks using gradient descent
We give the first superpolynomial lower bounds for learning one-layer neural networks with
respect to the Gaussian distribution for a broad class of algorithms. In the regression setting …
respect to the Gaussian distribution for a broad class of algorithms. In the regression setting …
Near-optimal sq lower bounds for agnostically learning halfspaces and relus under gaussian marginals
We study the fundamental problems of agnostically learning halfspaces and ReLUs under
Gaussian marginals. In the former problem, given labeled examples $(\bx, y) $ from an …
Gaussian marginals. In the former problem, given labeled examples $(\bx, y) $ from an …
Hardness of noise-free learning for two-hidden-layer neural networks
We give superpolynomial statistical query (SQ) lower bounds for learning two-hidden-layer
ReLU networks with respect to Gaussian inputs in the standard (noise-free) model. No …
ReLU networks with respect to Gaussian inputs in the standard (noise-free) model. No …
Distribution-specific hardness of learning neural networks
O Shamir - Journal of Machine Learning Research, 2018 - jmlr.org
Although neural networks are routinely and successfully trained in practice using simple
gradient-based methods, most existing theoretical results are negative, showing that …
gradient-based methods, most existing theoretical results are negative, showing that …