Google Академик

Y Yang, J Wang - arxiv preprint arxiv:2011.00583, 2020 - arxiv.org

Following the remarkable success of the AlphaGO series, 2019 was a booming year that
witnessed significant advances in multi-agent reinforcement learning (MARL) techniques …

Сачувај Цитирај 352 пута наведен Сродни чланци Све верзије (2) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] jmlr.org

On the theory of policy gradient methods: Optimality, approximation, and distribution shift

A Agarwal, SM Kakade, JD Lee, G Mahajan - Journal of Machine Learning …, 2021 - jmlr.org

Policy gradient methods are among the most effective methods in challenging reinforcement
learning problems with large state and/or action spaces. However, little is known about even …

Сачувај Цитирај 515 пута наведен Сродни чланци Све верзије (13) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] tor-lattimore.com

[Књига][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com

Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

Сачувај Цитирај 3356 пута наведен Сродни чланци Све верзије (9) Претрага библиотека

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Optimality and approximation with policy gradient methods in markov decision processes

A Agarwal, SM Kakade, JD Lee… - … on Learning Theory, 2020 - proceedings.mlr.press

Policy gradient (PG) methods are among the most effective methods in challenging
reinforcement learning problems with large state and/or action spaces. However, little is …

Сачувај Цитирај 400 пута наведен Сродни чланци Све верзије (3) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Provably efficient exploration in policy optimization

Q Cai, Z Yang, C **, Z Wang - International Conference on …, 2020 - proceedings.mlr.press

While policy-based reinforcement learning (RL) achieves tremendous successes in practice,
it is significantly less understood in theory, especially compared with value-based RL. In …

Сачувај Цитирај 324 пута наведен Сродни чланци Све верзије (10) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] nowpublishers.com

Introduction to online convex optimization

E Hazan - Foundations and Trends® in Optimization, 2016 - nowpublishers.com

This monograph portrays optimization as a process. In many practical applications the
environment is so complex that it is infeasible to lay out a comprehensive theoretical model …

Сачувај Цитирај 2219 пута наведен Сродни чланци Све верзије (19) Претрага библиотека HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] nowpublishers.com

Regret analysis of stochastic and nonstochastic multi-armed bandit problems

S Bubeck, N Cesa-Bianchi - Foundations and Trends® in …, 2012 - nowpublishers.com

Multi-armed bandit problems are the most basic examples of sequential decision problems
with an exploration-exploitation trade-off. This is the balance between staying with the option …

Сачувај Цитирај 3299 пута наведен Сродни чланци Све верзије (26) Претрага библиотека HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A unified view of entropy-regularized markov decision processes

G Neu, A Jonsson, V Gómez - arxiv preprint arxiv:1705.07798, 2017 - arxiv.org

We propose a general framework for entropy-regularized average-reward reinforcement
learning in Markov decision processes (MDPs). Our approach is based on extending the …

Сачувај Цитирај 296 пута наведен Сродни чланци Све верзије (7) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] uom.gr

A reinforcement learning-variable neighborhood search method for the capacitated vehicle routing problem

P Kalatzantonakis, A Sifaleras, N Samaras - Expert Systems with …, 2023 - Elsevier

Finding the best sequence of local search operators that yields the optimal performance of
Variable Neighborhood Search (VNS) is an important open research question in the field of …

Сачувај Цитирај 52 пута наведен Сродни чланци Све верзије (7)

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Non-stationary reinforcement learning without prior knowledge: An optimal black-box approach

CY Wei, H Luo - Conference on learning theory, 2021 - proceedings.mlr.press

We propose a black-box reduction that turns a certain reinforcement learning algorithm with
optimal regret in a (near-) stationary environment into another algorithm with optimal …

Сачувај Цитирај 124 пута наведен Сродни чланци Све верзије (4) HTML верзија

Направи обавештење

Цитирај

Напредна претрага

Сачувано у мојој библиотеци

Online Markov decision processes under bandit feedback

An overview of multi-agent reinforcement learning from game theoretical perspective

On the theory of policy gradient methods: Optimality, approximation, and distribution shift

[Књига][B] Bandit algorithms

Optimality and approximation with policy gradient methods in markov decision processes

Provably efficient exploration in policy optimization

Introduction to online convex optimization

Regret analysis of stochastic and nonstochastic multi-armed bandit problems

A unified view of entropy-regularized markov decision processes

A reinforcement learning-variable neighborhood search method for the capacitated vehicle routing problem

Non-stationary reinforcement learning without prior knowledge: An optimal black-box approach