Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
An overview of multi-agent reinforcement learning from game theoretical perspective
Y Yang, J Wang - arxiv preprint arxiv:2011.00583, 2020 - arxiv.org
Following the remarkable success of the AlphaGO series, 2019 was a booming year that
witnessed significant advances in multi-agent reinforcement learning (MARL) techniques …
witnessed significant advances in multi-agent reinforcement learning (MARL) techniques …
On the theory of policy gradient methods: Optimality, approximation, and distribution shift
Policy gradient methods are among the most effective methods in challenging reinforcement
learning problems with large state and/or action spaces. However, little is known about even …
learning problems with large state and/or action spaces. However, little is known about even …
[Књига][B] Bandit algorithms
T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …
and the multi-armed bandit model is a commonly used framework to address it. This …
Optimality and approximation with policy gradient methods in markov decision processes
Policy gradient (PG) methods are among the most effective methods in challenging
reinforcement learning problems with large state and/or action spaces. However, little is …
reinforcement learning problems with large state and/or action spaces. However, little is …
Provably efficient exploration in policy optimization
While policy-based reinforcement learning (RL) achieves tremendous successes in practice,
it is significantly less understood in theory, especially compared with value-based RL. In …
it is significantly less understood in theory, especially compared with value-based RL. In …
Introduction to online convex optimization
E Hazan - Foundations and Trends® in Optimization, 2016 - nowpublishers.com
This monograph portrays optimization as a process. In many practical applications the
environment is so complex that it is infeasible to lay out a comprehensive theoretical model …
environment is so complex that it is infeasible to lay out a comprehensive theoretical model …
Regret analysis of stochastic and nonstochastic multi-armed bandit problems
Multi-armed bandit problems are the most basic examples of sequential decision problems
with an exploration-exploitation trade-off. This is the balance between staying with the option …
with an exploration-exploitation trade-off. This is the balance between staying with the option …
A unified view of entropy-regularized markov decision processes
We propose a general framework for entropy-regularized average-reward reinforcement
learning in Markov decision processes (MDPs). Our approach is based on extending the …
learning in Markov decision processes (MDPs). Our approach is based on extending the …
A reinforcement learning-variable neighborhood search method for the capacitated vehicle routing problem
Finding the best sequence of local search operators that yields the optimal performance of
Variable Neighborhood Search (VNS) is an important open research question in the field of …
Variable Neighborhood Search (VNS) is an important open research question in the field of …
Non-stationary reinforcement learning without prior knowledge: An optimal black-box approach
We propose a black-box reduction that turns a certain reinforcement learning algorithm with
optimal regret in a (near-) stationary environment into another algorithm with optimal …
optimal regret in a (near-) stationary environment into another algorithm with optimal …