- Academic Search

K Zhang, Z Yang, T Başar - Handbook of reinforcement learning and …, 2021 - Springer

Recent years have witnessed significant advances in reinforcement learning (RL), which
has registered tremendous success in solving various sequential decision-making problems …

Save Cite Cited by 1705 Related articles All 8 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

A review of safe reinforcement learning: Methods, theory and applications

S Gu, L Yang, Y Du, G Chen, F Walter, J Wang… - arxiv preprint arxiv …, 2022 - arxiv.org

Reinforcement Learning (RL) has achieved tremendous success in many complex decision-
making tasks. However, safety concerns are raised during deploying RL in real-world …

Save Cite Cited by 297 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Fully decentralized multi-agent reinforcement learning with networked agents

K Zhang, Z Yang, H Liu, T Zhang… - … conference on machine …, 2018 - proceedings.mlr.press

We consider the fully decentralized multi-agent reinforcement learning (MARL) problem,
where the agents are connected via a time-varying and possibly sparse communication …

Save Cite Cited by 741 Related articles All 8 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] bookfusion.com

[BOOK][B] Algorithms for reinforcement learning

C Szepesvári - 2022 - books.google.com

Reinforcement learning is a learning paradigm concerned with learning to control a system
so as to maximize a numerical performance measure that expresses a long-term objective …

Save Cite Cited by 2260 Related articles All 24 versions Free GPT-4 Library Search

[Free GPT-4]

[PDF] arxiv.org

Reward constrained policy optimization

C Tessler, DJ Mankowitz, S Mannor - arxiv preprint arxiv:1805.11074, 2018 - arxiv.org

Solving tasks in Reinforcement Learning is no easy feat. As the goal of the agent is to
maximize the accumulated reward, it often learns to exploit loopholes and misspecifications …

Save Cite Cited by 626 Related articles All 4 versions Free GPT-4 View as HTML

[BOOK][B] Control systems and reinforcement learning

S Meyn - 2022 - books.google.com

A high school student can create deep Q-learning code to control her robot, without any
understanding of the meaning of'deep'or'Q', or why the code sometimes fails. This book is …

Save Cite Cited by 158 Related articles All 3 versions Free GPT-4 Library Search

[Free GPT-4]

[PDF] jmlr.org

Risk-constrained reinforcement learning with percentile risk criteria

Y Chow, M Ghavamzadeh, L Janson… - Journal of Machine …, 2018 - jmlr.org

In many sequential decision-making problems one is interested in minimizing an expected
cumulative cost while taking into account risk, ie, increased awareness of events of small …

Save Cite Cited by 630 Related articles All 12 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

A finite time analysis of temporal difference learning with linear function approximation

J Bhandari, D Russo, R Singal - Conference on learning …, 2018 - proceedings.mlr.press

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value
function corresponding to a given policy in a Markov decision process. Although TD is one of …

Save Cite Cited by 440 Related articles All 11 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Policy gradient method for robust reinforcement learning

Y Wang, S Zou - International conference on machine …, 2022 - proceedings.mlr.press

This paper develops the first policy gradient method with global optimality guarantee and
complexity analysis for robust reinforcement learning under model mismatch. Robust …

Save Cite Cited by 76 Related articles All 7 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Neural policy gradient methods: Global optimality and rates of convergence

L Wang, Q Cai, Z Yang, Z Wang - arxiv preprint arxiv:1909.01150, 2019 - arxiv.org

Policy gradient methods with actor-critic schemes demonstrate tremendous empirical
successes, especially when the actors and critics are parameterized by neural networks …

Save Cite Cited by 271 Related articles All 5 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Stochastic approximation: a dynamical systems viewpoint

Multi-agent reinforcement learning: A selective overview of theories and algorithms

A review of safe reinforcement learning: Methods, theory and applications

Fully decentralized multi-agent reinforcement learning with networked agents

[BOOK][B] Algorithms for reinforcement learning

Reward constrained policy optimization

[BOOK][B] Control systems and reinforcement learning

Risk-constrained reinforcement learning with percentile risk criteria

A finite time analysis of temporal difference learning with linear function approximation

Policy gradient method for robust reinforcement learning

Neural policy gradient methods: Global optimality and rates of convergence