Multi-agent reinforcement learning: A selective overview of theories and algorithms

K Zhang, Z Yang, T Başar - Handbook of reinforcement learning and …, 2021 - Springer
Recent years have witnessed significant advances in reinforcement learning (RL), which
has registered tremendous success in solving various sequential decision-making problems …

Reinforcement learning for selective key applications in power systems: Recent advances and future challenges

X Chen, G Qu, Y Tang, S Low… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
With large-scale integration of renewable generation and distributed energy resources,
modern power systems are confronted with new operational challenges, such as growing …

[BOG][B] Control systems and reinforcement learning

S Meyn - 2022 - books.google.com
A high school student can create deep Q-learning code to control her robot, without any
understanding of the meaning of'deep'or'Q', or why the code sometimes fails. This book is …

Online robust reinforcement learning with model uncertainty

Y Wang, S Zou - Advances in Neural Information Processing …, 2021 - proceedings.neurips.cc
Robust reinforcement learning (RL) is to find a policy that optimizes the worst-case
performance over an uncertainty set of MDPs. In this paper, we focus on model-free robust …

Federated reinforcement learning: Linear speedup under markovian sampling

S Khodadadian, P Sharma, G Joshi… - International …, 2022 - proceedings.mlr.press
Since reinforcement learning algorithms are notoriously data-intensive, the task of sampling
observations from the environment is usually split across multiple agents. However …

Crpo: A new approach for safe reinforcement learning with convergence guarantee

T Xu, Y Liang, G Lan - International Conference on Machine …, 2021 - proceedings.mlr.press
In safe reinforcement learning (SRL) problems, an agent explores the environment to
maximize an expected total reward and meanwhile avoids violation of certain constraints on …

Finite-sample analysis for sarsa with linear function approximation

S Zou, T Xu, Y Liang - Advances in neural information …, 2019 - proceedings.neurips.cc
SARSA is an on-policy algorithm to learn a Markov decision process policy in reinforcement
learning. We investigate the SARSA algorithm with linear function approximation under the …

A finite-time analysis of two time-scale actor-critic methods

YF Wu, W Zhang, P Xu, Q Gu - Advances in Neural …, 2020 - proceedings.neurips.cc
Actor-critic (AC) methods have exhibited great empirical success compared with other
reinforcement learning algorithms, where the actor uses the policy gradient to improve the …

Breaking the sample size barrier in model-based reinforcement learning with a generative model

G Li, Y Wei, Y Chi, Y Gu… - Advances in neural …, 2020 - proceedings.neurips.cc
We investigate the sample efficiency of reinforcement learning in a $\gamma $-discounted
infinite-horizon Markov decision process (MDP) with state space S and action space A …

Is Q-learning minimax optimal? a tight sample complexity analysis

G Li, C Cai, Y Chen, Y Wei, Y Chi - Operations Research, 2024 - pubsonline.informs.org
Q-learning, which seeks to learn the optimal Q-function of a Markov decision process (MDP)
in a model-free fashion, lies at the heart of reinforcement learning. When it comes to the …