Independent policy gradient for large-scale markov potential games: Sharper rates, function approximation, and game-agnostic convergence

D Ding, CY Wei, K Zhang… - … Conference on Machine …, 2022 - proceedings.mlr.press
We examine global non-asymptotic convergence properties of policy gradient methods for
multi-agent reinforcement learning (RL) problems in Markov potential games (MPGs). To …

The role of baselines in policy gradient optimization

J Mei, W Chung, V Thomas, B Dai… - Advances in …, 2022 - proceedings.neurips.cc
We study the effect of baselines in on-policy stochastic policy gradient optimization, and
close the gap between the theory and practice of policy optimization methods. Our first …

Decentralized cooperative reinforcement learning with hierarchical information structure

H Kao, CY Wei, V Subramanian - … Conference on Algorithmic …, 2022 - proceedings.mlr.press
Multi-agent reinforcement learning (MARL) problems are challenging due to information
asymmetry. To overcome this challenge, existing methods often require high level of …

Independent natural policy gradient methods for potential games: Finite-time global convergence with entropy regularization

S Cen, F Chen, Y Chi - 2022 IEEE 61st Conference on Decision …, 2022 - ieeexplore.ieee.org
A major challenge in multi-agent systems is that the system complexity grows dramatically
with the number of agents as well as the size of their action spaces, which is typical in real …

Context-aware Bayesian network actor-critic methods for cooperative multi-agent reinforcement learning

D Chen, Q Zhang - International Conference on Machine …, 2023 - proceedings.mlr.press
Executing actions in a correlated manner is a common strategy for human coordination that
often leads to better cooperation, which is also potentially beneficial for cooperative multi …

[PDF][PDF] PROVABLE REINFORCEMENT LEARNING FOR CONSTRAINED AND MULTI-AGENT CONTROL SYSTEMS

D Ding - 2022 - viterbi-web.usc.edu
Reinforcement Learning (RL) is an algorithmic paradigm for sequential decision-making in
which a controller (or an agent) aims to maximize the task-associated long-term reward by …

最优化对数障碍法

郑陈轩 - Advances in Applied Mathematics, 2022 - hanspub.org
在求解不等式约束优化问题中对数障碍函数方法是非常流行的, 众所周知, 对数障碍函数在线性
规划与线性半定规划的内点方法中起着重要的作用. 本文主要介绍了对数障碍方法及其算法 …