- Academic Search

K Zhang, Z Yang, T Başar - Handbook of reinforcement learning and …, 2021 - Springer

Recent years have witnessed significant advances in reinforcement learning (RL), which
has registered tremendous success in solving various sequential decision-making problems …

Save Cite Cited by 1700 Related articles All 8 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

An overview of multi-agent reinforcement learning from game theoretical perspective

Y Yang, J Wang - arxiv preprint arxiv:2011.00583, 2020 - arxiv.org

Following the remarkable success of the AlphaGO series, 2019 was a booming year that
witnessed significant advances in multi-agent reinforcement learning (MARL) techniques …

Save Cite Cited by 350 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Is pessimism provably efficient for offline rl?

Y **, Z Yang, Z Wang - International Conference on …, 2021 - proceedings.mlr.press

We study offline reinforcement learning (RL), which aims to learn an optimal policy based on
a dataset collected a priori. Due to the lack of further interactions with the environment …

Save Cite Cited by 449 Related articles All 7 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

A theoretical analysis of deep Q-learning

J Fan, Z Wang, Y **e, Z Yang - Learning for dynamics and …, 2020 - proceedings.mlr.press

Despite the great empirical success of deep reinforcement learning, its theoretical
foundation is less well understood. In this work, we make the first attempt to theoretically …

Save Cite Cited by 857 Related articles All 9 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Bridging offline reinforcement learning and imitation learning: A tale of pessimism

P Rashidinejad, B Zhu, C Ma, J Jiao… - Advances in Neural …, 2021 - proceedings.neurips.cc

Offline (or batch) reinforcement learning (RL) algorithms seek to learn an optimal policy from
a fixed dataset without active data collection. Based on the composition of the offline dataset …

Save Cite Cited by 315 Related articles All 8 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Single and multi-agent deep reinforcement learning for AI-enabled wireless networks: A tutorial

A Feriani, E Hossain - IEEE Communications Surveys & …, 2021 - ieeexplore.ieee.org

Deep Reinforcement Learning (DRL) has recently witnessed significant advances that have
led to multiple successes in solving sequential decision-making problems in various …

Save Cite Cited by 307 Related articles All 3 versions Free GPT-4

[Free GPT-4]

[PDF] jmlr.org

On the theory of policy gradient methods: Optimality, approximation, and distribution shift

A Agarwal, SM Kakade, JD Lee, G Mahajan - Journal of Machine Learning …, 2021 - jmlr.org

Policy gradient methods are among the most effective methods in challenging reinforcement
learning problems with large state and/or action spaces. However, little is known about even …

Save Cite Cited by 499 Related articles All 13 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

A review of cooperative multi-agent deep reinforcement learning

A Oroojlooy, D Ha**ezhad - Applied Intelligence, 2023 - Springer

Abstract Deep Reinforcement Learning has made significant progress in multi-agent
systems in recent years. The aim of this review article is to provide an overview of recent …

Save Cite Cited by 511 Related articles All 8 versions Free GPT-4

[Free GPT-4]

[PDF] mlr.press

Optimality and approximation with policy gradient methods in markov decision processes

A Agarwal, SM Kakade, JD Lee… - … on Learning Theory, 2020 - proceedings.mlr.press

Policy gradient (PG) methods are among the most effective methods in challenging
reinforcement learning problems with large state and/or action spaces. However, little is …

Save Cite Cited by 409 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Natural policy gradient primal-dual method for constrained markov decision processes

D Ding, K Zhang, T Basar… - Advances in Neural …, 2020 - proceedings.neurips.cc

We study sequential decision-making problems in which each agent aims to maximize the
expected total reward while satisfying a constraint on the expected total utility. We employ …

Save Cite Cited by 221 Related articles All 8 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Neural trust region/proximal policy optimization attains globally optimal policy

Multi-agent reinforcement learning: A selective overview of theories and algorithms

An overview of multi-agent reinforcement learning from game theoretical perspective

Is pessimism provably efficient for offline rl?

A theoretical analysis of deep Q-learning

Bridging offline reinforcement learning and imitation learning: A tale of pessimism

Single and multi-agent deep reinforcement learning for AI-enabled wireless networks: A tutorial

On the theory of policy gradient methods: Optimality, approximation, and distribution shift

A review of cooperative multi-agent deep reinforcement learning

Optimality and approximation with policy gradient methods in markov decision processes

Natural policy gradient primal-dual method for constrained markov decision processes