Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
An overview of multi-agent reinforcement learning from game theoretical perspective
Y Yang, J Wang - arxiv preprint arxiv:2011.00583, 2020 - arxiv.org
Following the remarkable success of the AlphaGO series, 2019 was a booming year that
witnessed significant advances in multi-agent reinforcement learning (MARL) techniques …
witnessed significant advances in multi-agent reinforcement learning (MARL) techniques …
Reinforcement learning: A tutorial survey and recent advances
A Gosavi - INFORMS Journal on Computing, 2009 - pubsonline.informs.org
In the last few years, reinforcement learning (RL), also called adaptive (or approximate)
dynamic programming, has emerged as a powerful tool for solving complex sequential …
dynamic programming, has emerged as a powerful tool for solving complex sequential …
Adversarially trained actor critic for offline reinforcement learning
Abstract We propose Adversarially Trained Actor Critic (ATAC), a new model-free algorithm
for offline reinforcement learning (RL) under insufficient data coverage, based on the …
for offline reinforcement learning (RL) under insufficient data coverage, based on the …
A two-timescale stochastic algorithm framework for bilevel optimization: Complexity analysis and application to actor-critic
This paper analyzes a two-timescale stochastic algorithm framework for bilevel optimization.
Bilevel optimization is a class of problems which exhibits a two-level structure, and its goal is …
Bilevel optimization is a class of problems which exhibits a two-level structure, and its goal is …
A two-level charging scheduling method for public electric vehicle charging stations considering heterogeneous demand and nonlinear charging profile
This paper investigates the electric vehicle (EV) charging scheduling problem for public EV
charging stations (EVCSs) that can accommodate heterogeneous charging demands …
charging stations (EVCSs) that can accommodate heterogeneous charging demands …
Gans trained by a two time-scale update rule converge to a local nash equilibrium
Abstract Generative Adversarial Networks (GANs) excel at creating realistic images with
complex models for which maximum likelihood is infeasible. However, the convergence of …
complex models for which maximum likelihood is infeasible. However, the convergence of …
SBEED: Convergent reinforcement learning with nonlinear function approximation
When function approximation is used, solving the Bellman optimality equation with stability
guarantees has remained a major open problem in reinforcement learning for decades. The …
guarantees has remained a major open problem in reinforcement learning for decades. The …
Rudder: Return decomposition for delayed rewards
We propose RUDDER, a novel reinforcement learning approach for delayed rewards in
finite Markov decision processes (MDPs). In MDPs the Q-values are equal to the expected …
finite Markov decision processes (MDPs). In MDPs the Q-values are equal to the expected …
Fedgan: Federated generative adversarial networks for distributed data
We propose Federated Generative Adversarial Network (FedGAN) for training a GAN across
distributed sources of non-independent-and-identically-distributed data sources subject to …
distributed sources of non-independent-and-identically-distributed data sources subject to …
Parametrized deep q-networks learning: Reinforcement learning with discrete-continuous hybrid action space
Most existing deep reinforcement learning (DRL) frameworks consider either discrete action
space or continuous action space solely. Motivated by applications in computer games, we …
space or continuous action space solely. Motivated by applications in computer games, we …