Physics-informed machine learning: A survey on problems, methods and applications
Recent advances of data-driven machine learning have revolutionized fields like computer
vision, reinforcement learning, and many scientific and engineering domains. In many real …
vision, reinforcement learning, and many scientific and engineering domains. In many real …
The merged-staircase property: a necessary and nearly sufficient condition for sgd learning of sparse functions on two-layer neural networks
It is currently known how to characterize functions that neural networks can learn with SGD
for two extremal parametrizations: neural networks in the linear regime, and neural networks …
for two extremal parametrizations: neural networks in the linear regime, and neural networks …
Shuffled model of differential privacy in federated learning
We consider a distributed empirical risk minimization (ERM) optimization problem with
communication efficiency and privacy requirements, motivated by the federated learning …
communication efficiency and privacy requirements, motivated by the federated learning …
Sparsified SGD with memory
Huge scale machine learning problems are nowadays tackled by distributed optimization
algorithms, ie algorithms that leverage the compute power of many devices for training. The …
algorithms, ie algorithms that leverage the compute power of many devices for training. The …
Local SGD converges fast and communicates little
SU Stich - arxiv preprint arxiv:1805.09767, 2018 - arxiv.org
Mini-batch stochastic gradient descent (SGD) is state of the art in large scale distributed
training. The scheme can reach a linear speedup with respect to the number of workers, but …
training. The scheme can reach a linear speedup with respect to the number of workers, but …
Dog is sgd's best friend: A parameter-free dynamic step size schedule
We propose a tuning-free dynamic SGD step size formula, which we call Distance over
Gradients (DoG). The DoG step sizes depend on simple empirical quantities (distance from …
Gradients (DoG). The DoG step sizes depend on simple empirical quantities (distance from …
Large-scale methods for distributionally robust optimization
We propose and analyze algorithms for distributionally robust optimization of convex losses
with conditional value at risk (CVaR) and $\chi^ 2$ divergence uncertainty sets. We prove …
with conditional value at risk (CVaR) and $\chi^ 2$ divergence uncertainty sets. We prove …
Deep-learning inversion: A next-generation seismic velocity model building method
Seismic velocity is one of the most important parameters used in seismic exploration.
Accurate velocity models are the key prerequisites for reverse time migration and other high …
Accurate velocity models are the key prerequisites for reverse time migration and other high …
SGD: General analysis and improved rates
We propose a general yet simple theorem describing the convergence of SGD under the
arbitrary sampling paradigm. Our theorem describes the convergence of an infinite array of …
arbitrary sampling paradigm. Our theorem describes the convergence of an infinite array of …
Deep learning with label differential privacy
Abstract The Randomized Response (RR) algorithm is a classical technique to improve
robustness in survey aggregation, and has been widely adopted in applications with …
robustness in survey aggregation, and has been widely adopted in applications with …