A tour of reinforcement learning: The view from continuous control
B Recht - Annual Review of Control, Robotics, and Autonomous …, 2019 - annualreviews.org
This article surveys reinforcement learning from the perspective of optimization and control,
with a focus on continuous control applications. It reviews the general formulation …
with a focus on continuous control applications. It reviews the general formulation …
A primer on zeroth-order optimization in signal processing and machine learning: Principals, recent advances, and applications
Zeroth-order (ZO) optimization is a subset of gradient-free optimization that emerges in many
signal processing and machine learning (ML) applications. It is used for solving optimization …
signal processing and machine learning (ML) applications. It is used for solving optimization …
Fine-tuning language models with just forward passes
Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but
as LMs grow in size, backpropagation requires a prohibitively large amount of memory …
as LMs grow in size, backpropagation requires a prohibitively large amount of memory …
The statistical complexity of interactive decision making
A fundamental challenge in interactive learning and decision making, ranging from bandit
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …
Hopskipjumpattack: A query-efficient decision-based attack
The goal of a decision-based adversarial attack on a trained model is to generate
adversarial examples based solely on observing output labels returned by the targeted …
adversarial examples based solely on observing output labels returned by the targeted …
Derivative-free optimization methods
In many optimization problems arising from scientific, engineering and artificial intelligence
applications, objective and constraint functions are available only as the output of a black …
applications, objective and constraint functions are available only as the output of a black …
Efficient decision-based black-box adversarial attacks on face recognition
Face recognition has obtained remarkable progress in recent years due to the great
improvement of deep convolutional neural networks (CNNs). However, deep CNNs are …
improvement of deep convolutional neural networks (CNNs). However, deep CNNs are …
Introduction to multi-armed bandits
A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …
decisions over time under uncertainty. An enormous body of work has accumulated over the …
Global convergence of policy gradient methods for the linear quadratic regulator
Direct policy gradient methods for reinforcement learning and continuous control problems
are a popular approach for a variety of reasons: 1) they are easy to implement without …
are a popular approach for a variety of reasons: 1) they are easy to implement without …
Optimal stochastic non-smooth non-convex optimization through online-to-non-convex conversion
A Cutkosky, H Mehta… - … Conference on Machine …, 2023 - proceedings.mlr.press
We present new algorithms for optimizing non-smooth, non-convex stochastic objectives
based on a novel analysis technique. This improves the current best-known complexity for …
based on a novel analysis technique. This improves the current best-known complexity for …