An overview of multi-agent reinforcement learning from game theoretical perspective

Y Yang, J Wang - arxiv preprint arxiv:2011.00583, 2020 - arxiv.org
Following the remarkable success of the AlphaGO series, 2019 was a booming year that
witnessed significant advances in multi-agent reinforcement learning (MARL) techniques …

Reinforcement learning: A tutorial survey and recent advances

A Gosavi - INFORMS Journal on Computing, 2009 - pubsonline.informs.org
In the last few years, reinforcement learning (RL), also called adaptive (or approximate)
dynamic programming, has emerged as a powerful tool for solving complex sequential …

Adversarially trained actor critic for offline reinforcement learning

CA Cheng, T **e, N Jiang… - … Conference on Machine …, 2022 - proceedings.mlr.press
Abstract We propose Adversarially Trained Actor Critic (ATAC), a new model-free algorithm
for offline reinforcement learning (RL) under insufficient data coverage, based on the …

A two-timescale stochastic algorithm framework for bilevel optimization: Complexity analysis and application to actor-critic

M Hong, HT Wai, Z Wang, Z Yang - SIAM Journal on Optimization, 2023 - SIAM
This paper analyzes a two-timescale stochastic algorithm framework for bilevel optimization.
Bilevel optimization is a class of problems which exhibits a two-level structure, and its goal is …

A two-level charging scheduling method for public electric vehicle charging stations considering heterogeneous demand and nonlinear charging profile

Z Zhao, CKM Lee, J Ren - Applied energy, 2024 - Elsevier
This paper investigates the electric vehicle (EV) charging scheduling problem for public EV
charging stations (EVCSs) that can accommodate heterogeneous charging demands …

Gans trained by a two time-scale update rule converge to a local nash equilibrium

M Heusel, H Ramsauer, T Unterthiner… - Advances in neural …, 2017 - proceedings.neurips.cc
Abstract Generative Adversarial Networks (GANs) excel at creating realistic images with
complex models for which maximum likelihood is infeasible. However, the convergence of …

SBEED: Convergent reinforcement learning with nonlinear function approximation

B Dai, A Shaw, L Li, L **ao, N He… - International …, 2018 - proceedings.mlr.press
When function approximation is used, solving the Bellman optimality equation with stability
guarantees has remained a major open problem in reinforcement learning for decades. The …

Rudder: Return decomposition for delayed rewards

JA Arjona-Medina, M Gillhofer… - Advances in …, 2019 - proceedings.neurips.cc
We propose RUDDER, a novel reinforcement learning approach for delayed rewards in
finite Markov decision processes (MDPs). In MDPs the Q-values are equal to the expected …

Fedgan: Federated generative adversarial networks for distributed data

M Rasouli, T Sun, R Rajagopal - arxiv preprint arxiv:2006.07228, 2020 - arxiv.org
We propose Federated Generative Adversarial Network (FedGAN) for training a GAN across
distributed sources of non-independent-and-identically-distributed data sources subject to …

Parametrized deep q-networks learning: Reinforcement learning with discrete-continuous hybrid action space

J **ong, Q Wang, Z Yang, P Sun, L Han… - arxiv preprint arxiv …, 2018 - arxiv.org
Most existing deep reinforcement learning (DRL) frameworks consider either discrete action
space or continuous action space solely. Motivated by applications in computer games, we …