Towards continual reinforcement learning: A review and perspectives

K Khetarpal, M Riemer, I Rish, D Precup - Journal of Artificial Intelligence …, 2022 - jair.org
In this article, we aim to provide a literature review of different formulations and approaches
to continual reinforcement learning (RL), also known as lifelong or non-stationary RL. We …

A survey of opponent modeling in adversarial domains

S Nashed, S Zilberstein - Journal of Artificial Intelligence Research, 2022 - jair.org
Opponent modeling is the ability to use prior knowledge and observations in order to predict
the behavior of an opponent. This survey presents a comprehensive overview of existing …

Order matters: Agent-by-agent policy optimization

X Wang, Z Tian, Z Wan, Y Wen, J Wang… - arxiv preprint arxiv …, 2023 - arxiv.org
While multi-agent trust region algorithms have achieved great success empirically in solving
coordination tasks, most of them, however, suffer from a non-stationarity problem since …

A multiagent cooperative learning system with evolution of social roles

Y Hou, M Sun, Y Zeng, YS Ong, Y **… - IEEE Transactions …, 2023 - ieeexplore.ieee.org
Recent developments in reinforcement learning (RL) have been able to derive optimal
policies for sophisticated and capable agents, and shown to achieve human-level …

Game-Theoretic Driver Modeling and Decision-Making for Autonomous Driving with Temporal-Spatial Attention-Based Deep Q-Learning

X Zhou, Z Peng, Y **e, M Liu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
The safe and efficient navigation of autonomous vehicles in complex traffic scenarios
remains a significant challenge. One of the key impediments is the limited effectiveness of …

GCEN: Multiagent Deep Reinforcement Learning With Grouped Cognitive Feature Representation

H Gao, X Xu, C Yan, Y Lan… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
In recent years, cooperative multiagent deep reinforcement learning (MADRL) has received
increasing research interest and has been widely applied to computer games and …

Trust region bounds for decentralized ppo under non-stationarity

M Sun, S Devlin, J Beck, K Hofmann… - arxiv preprint arxiv …, 2022 - arxiv.org
We present trust region bounds for optimizing decentralized policies in cooperative Multi-
Agent Reinforcement Learning (MARL), which holds even when the transition dynamics are …

JointPPO: Diving deeper into the effectiveness of PPO in multi-agent reinforcement learning

C Liu, G Liu - arxiv preprint arxiv:2404.11831, 2024 - arxiv.org
While Centralized Training with Decentralized Execution (CTDE) has become the prevailing
paradigm in Multi-Agent Reinforcement Learning (MARL), it may not be suitable for …

A game-theoretic approach to multi-agent trust region optimization

Y Wen, H Chen, Y Yang, M Li, Z Tian, X Chen… - … on Distributed Artificial …, 2022 - Springer
Trust region methods are widely applied in single-agent reinforcement learning problems
due to their monotonic performance-improvement guarantee at every iteration. Nonetheless …

Monotonic improvement guarantees under non-stationarity for decentralized PPO

M Sun, S Devlin, JA Beck, K Hofmann, S Whiteson - 2022 - openreview.net
We present a new monotonic improvement guarantee for optimizing decentralized policies
in cooperative Multi-Agent Reinforcement Learning (MARL), which holds even when the …