Multi-agent reinforcement learning: A selective overview of theories and algorithms

K Zhang, Z Yang, T Başar - Handbook of reinforcement learning and …, 2021 - Springer
Recent years have witnessed significant advances in reinforcement learning (RL), which
has registered tremendous success in solving various sequential decision-making problems …

A review of safe reinforcement learning: Methods, theory and applications

S Gu, L Yang, Y Du, G Chen, F Walter, J Wang… - arxiv preprint arxiv …, 2022 - arxiv.org
Reinforcement Learning (RL) has achieved tremendous success in many complex decision-
making tasks. However, safety concerns are raised during deploying RL in real-world …

Fully decentralized multi-agent reinforcement learning with networked agents

K Zhang, Z Yang, H Liu, T Zhang… - … conference on machine …, 2018 - proceedings.mlr.press
We consider the fully decentralized multi-agent reinforcement learning (MARL) problem,
where the agents are connected via a time-varying and possibly sparse communication …

[BOOK][B] Algorithms for reinforcement learning

C Szepesvári - 2022 - books.google.com
Reinforcement learning is a learning paradigm concerned with learning to control a system
so as to maximize a numerical performance measure that expresses a long-term objective …

Reward constrained policy optimization

C Tessler, DJ Mankowitz, S Mannor - arxiv preprint arxiv:1805.11074, 2018 - arxiv.org
Solving tasks in Reinforcement Learning is no easy feat. As the goal of the agent is to
maximize the accumulated reward, it often learns to exploit loopholes and misspecifications …

[BOOK][B] Control systems and reinforcement learning

S Meyn - 2022 - books.google.com
A high school student can create deep Q-learning code to control her robot, without any
understanding of the meaning of'deep'or'Q', or why the code sometimes fails. This book is …

Risk-constrained reinforcement learning with percentile risk criteria

Y Chow, M Ghavamzadeh, L Janson… - Journal of Machine …, 2018 - jmlr.org
In many sequential decision-making problems one is interested in minimizing an expected
cumulative cost while taking into account risk, ie, increased awareness of events of small …

A finite time analysis of temporal difference learning with linear function approximation

J Bhandari, D Russo, R Singal - Conference on learning …, 2018 - proceedings.mlr.press
Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value
function corresponding to a given policy in a Markov decision process. Although TD is one of …

Policy gradient method for robust reinforcement learning

Y Wang, S Zou - International conference on machine …, 2022 - proceedings.mlr.press
This paper develops the first policy gradient method with global optimality guarantee and
complexity analysis for robust reinforcement learning under model mismatch. Robust …

Neural policy gradient methods: Global optimality and rates of convergence

L Wang, Q Cai, Z Yang, Z Wang - arxiv preprint arxiv:1909.01150, 2019 - arxiv.org
Policy gradient methods with actor-critic schemes demonstrate tremendous empirical
successes, especially when the actors and critics are parameterized by neural networks …