A tour of reinforcement learning: The view from continuous control
B Recht - Annual Review of Control, Robotics, and Autonomous …, 2019 - annualreviews.org
This article surveys reinforcement learning from the perspective of optimization and control,
with a focus on continuous control applications. It reviews the general formulation …
with a focus on continuous control applications. It reviews the general formulation …
A primer on zeroth-order optimization in signal processing and machine learning: Principals, recent advances, and applications
Zeroth-order (ZO) optimization is a subset of gradient-free optimization that emerges in many
signal processing and machine learning (ML) applications. It is used for solving optimization …
signal processing and machine learning (ML) applications. It is used for solving optimization …
Fine-tuning language models with just forward passes
Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but
as LMs grow in size, backpropagation requires a prohibitively large amount of memory …
as LMs grow in size, backpropagation requires a prohibitively large amount of memory …
Introduction to online convex optimization
E Hazan - Foundations and Trends® in Optimization, 2016 - nowpublishers.com
This monograph portrays optimization as a process. In many practical applications the
environment is so complex that it is infeasible to lay out a comprehensive theoretical model …
environment is so complex that it is infeasible to lay out a comprehensive theoretical model …
Hopskipjumpattack: A query-efficient decision-based attack
The goal of a decision-based adversarial attack on a trained model is to generate
adversarial examples based solely on observing output labels returned by the targeted …
adversarial examples based solely on observing output labels returned by the targeted …
Regret analysis of stochastic and nonstochastic multi-armed bandit problems
Multi-armed bandit problems are the most basic examples of sequential decision problems
with an exploration-exploitation trade-off. This is the balance between staying with the option …
with an exploration-exploitation trade-off. This is the balance between staying with the option …
Introduction to multi-armed bandits
A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …
decisions over time under uncertainty. An enormous body of work has accumulated over the …
Online learning and online convex optimization
S Shalev-Shwartz - Foundations and Trends® in Machine …, 2012 - nowpublishers.com
Online learning is a well established learning paradigm which has both theoretical and
practical appeals. The goal of online learning is to make a sequence of accurate predictions …
practical appeals. The goal of online learning is to make a sequence of accurate predictions …
The statistical complexity of interactive decision making
A fundamental challenge in interactive learning and decision making, ranging from bandit
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …
Private empirical risk minimization: Efficient algorithms and tight error bounds
Convex empirical risk minimization is a basic tool in machine learning and statistics. We
provide new algorithms and matching lower bounds for differentially private convex …
provide new algorithms and matching lower bounds for differentially private convex …