Multi-agent reinforcement learning: A selective overview of theories and algorithms
Recent years have witnessed significant advances in reinforcement learning (RL), which
has registered tremendous success in solving various sequential decision-making problems …
has registered tremendous success in solving various sequential decision-making problems …
A review of safe reinforcement learning: Methods, theory and applications
Reinforcement Learning (RL) has achieved tremendous success in many complex decision-
making tasks. However, safety concerns are raised during deploying RL in real-world …
making tasks. However, safety concerns are raised during deploying RL in real-world …
Fully decentralized multi-agent reinforcement learning with networked agents
We consider the fully decentralized multi-agent reinforcement learning (MARL) problem,
where the agents are connected via a time-varying and possibly sparse communication …
where the agents are connected via a time-varying and possibly sparse communication …
[BOOK][B] Algorithms for reinforcement learning
C Szepesvári - 2022 - books.google.com
Reinforcement learning is a learning paradigm concerned with learning to control a system
so as to maximize a numerical performance measure that expresses a long-term objective …
so as to maximize a numerical performance measure that expresses a long-term objective …
Reward constrained policy optimization
Solving tasks in Reinforcement Learning is no easy feat. As the goal of the agent is to
maximize the accumulated reward, it often learns to exploit loopholes and misspecifications …
maximize the accumulated reward, it often learns to exploit loopholes and misspecifications …
[BOOK][B] Control systems and reinforcement learning
S Meyn - 2022 - books.google.com
A high school student can create deep Q-learning code to control her robot, without any
understanding of the meaning of'deep'or'Q', or why the code sometimes fails. This book is …
understanding of the meaning of'deep'or'Q', or why the code sometimes fails. This book is …
Risk-constrained reinforcement learning with percentile risk criteria
In many sequential decision-making problems one is interested in minimizing an expected
cumulative cost while taking into account risk, ie, increased awareness of events of small …
cumulative cost while taking into account risk, ie, increased awareness of events of small …
A finite time analysis of temporal difference learning with linear function approximation
Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value
function corresponding to a given policy in a Markov decision process. Although TD is one of …
function corresponding to a given policy in a Markov decision process. Although TD is one of …
Policy gradient method for robust reinforcement learning
This paper develops the first policy gradient method with global optimality guarantee and
complexity analysis for robust reinforcement learning under model mismatch. Robust …
complexity analysis for robust reinforcement learning under model mismatch. Robust …
Neural policy gradient methods: Global optimality and rates of convergence
Policy gradient methods with actor-critic schemes demonstrate tremendous empirical
successes, especially when the actors and critics are parameterized by neural networks …
successes, especially when the actors and critics are parameterized by neural networks …