A tour of reinforcement learning: The view from continuous control

B Recht - Annual Review of Control, Robotics, and Autonomous …, 2019 - annualreviews.org
This article surveys reinforcement learning from the perspective of optimization and control,
with a focus on continuous control applications. It reviews the general formulation …

A primer on zeroth-order optimization in signal processing and machine learning: Principals, recent advances, and applications

S Liu, PY Chen, B Kailkhura, G Zhang… - IEEE Signal …, 2020 - ieeexplore.ieee.org
Zeroth-order (ZO) optimization is a subset of gradient-free optimization that emerges in many
signal processing and machine learning (ML) applications. It is used for solving optimization …

Fine-tuning language models with just forward passes

S Malladi, T Gao, E Nichani… - Advances in …, 2023 - proceedings.neurips.cc
Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but
as LMs grow in size, backpropagation requires a prohibitively large amount of memory …

Introduction to online convex optimization

E Hazan - Foundations and Trends® in Optimization, 2016 - nowpublishers.com
This monograph portrays optimization as a process. In many practical applications the
environment is so complex that it is infeasible to lay out a comprehensive theoretical model …

Hopskipjumpattack: A query-efficient decision-based attack

J Chen, MI Jordan… - 2020 ieee symposium on …, 2020 - ieeexplore.ieee.org
The goal of a decision-based adversarial attack on a trained model is to generate
adversarial examples based solely on observing output labels returned by the targeted …

Regret analysis of stochastic and nonstochastic multi-armed bandit problems

S Bubeck, N Cesa-Bianchi - Foundations and Trends® in …, 2012 - nowpublishers.com
Multi-armed bandit problems are the most basic examples of sequential decision problems
with an exploration-exploitation trade-off. This is the balance between staying with the option …

Introduction to multi-armed bandits

A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …

Online learning and online convex optimization

S Shalev-Shwartz - Foundations and Trends® in Machine …, 2012 - nowpublishers.com
Online learning is a well established learning paradigm which has both theoretical and
practical appeals. The goal of online learning is to make a sequence of accurate predictions …

The statistical complexity of interactive decision making

DJ Foster, SM Kakade, J Qian, A Rakhlin - arxiv preprint arxiv:2112.13487, 2021 - arxiv.org
A fundamental challenge in interactive learning and decision making, ranging from bandit
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …

Private empirical risk minimization: Efficient algorithms and tight error bounds

R Bassily, A Smith, A Thakurta - 2014 IEEE 55th annual …, 2014 - ieeexplore.ieee.org
Convex empirical risk minimization is a basic tool in machine learning and statistics. We
provide new algorithms and matching lower bounds for differentially private convex …