- Academic Search

Q Cai, Z Yang, C **, Z Wang - International Conference on …, 2020 - proceedings.mlr.press

While policy-based reinforcement learning (RL) achieves tremendous successes in practice,
it is significantly less understood in theory, especially compared with value-based RL. In …

保存引用被引用数: 323 関連記事全 9 バージョン HTMLバージョン

[Free GPT-4]

[PDF] bookfusion.com

[書籍][B] Algorithms for reinforcement learning

C Szepesvári - 2022 - books.google.com

Reinforcement learning is a learning paradigm concerned with learning to control a system
so as to maximize a numerical performance measure that expresses a long-term objective …

保存引用被引用数: 2256 関連記事全 24 バージョン図書館検索

[Free GPT-4]

[PDF] mlr.press

Non-stationary reinforcement learning without prior knowledge: An optimal black-box approach

CY Wei, H Luo - Conference on learning theory, 2021 - proceedings.mlr.press

We propose a black-box reduction that turns a certain reinforcement learning algorithm with
optimal regret in a (near-) stationary environment into another algorithm with optimal …

保存引用被引用数: 122 関連記事全 4 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

A unified view of entropy-regularized markov decision processes

G Neu, A Jonsson, V Gómez - arxiv preprint arxiv:1705.07798, 2017 - arxiv.org

We propose a general framework for entropy-regularized average-reward reinforcement
learning in Markov decision processes (MDPs). Our approach is based on extending the …

保存引用被引用数: 292 関連記事全 9 バージョン HTMLバージョン

[Free GPT-4]

[PDF] neurips.cc

Corruption-robust offline reinforcement learning with general function approximation

C Ye, R Yang, Q Gu, T Zhang - Advances in Neural …, 2024 - proceedings.neurips.cc

We investigate the problem of corruption robustness in offline reinforcement learning (RL)
with general function approximation, where an adversary can corrupt each sample in the …

保存引用被引用数: 18 関連記事全 7 バージョン HTMLバージョン

[Free GPT-4]

[PDF] mlr.press

Learning adversarial markov decision processes with bandit feedback and unknown transition

C **, T **, H Luo, S Sra, T Yu - International Conference on …, 2020 - proceedings.mlr.press

We consider the task of learning in episodic finite-horizon Markov decision processes with
an unknown transition function, bandit feedback, and adversarial losses. We propose an …

保存引用被引用数: 110 関連記事全 8 バージョン HTMLバージョン

[Free GPT-4]

[PDF] mlr.press

Online convex optimization in adversarial markov decision processes

A Rosenberg, Y Mansour - International Conference on …, 2019 - proceedings.mlr.press

We consider online learning in episodic loop-free Markov decision processes (MDPs),
where the loss function can change arbitrarily between episodes, and the transition function …

保存引用被引用数: 158 関連記事全 9 バージョン HTMLバージョン

[Free GPT-4]

[PDF] mlr.press

Corruption-robust algorithms with uncertainty weighting for nonlinear contextual bandits and markov decision processes

C Ye, W **ong, Q Gu, T Zhang - International Conference on …, 2023 - proceedings.mlr.press

Despite the significant interest and progress in reinforcement learning (RL) problems with
adversarial corruption, current works are either confined to the linear setting or lead to an …

保存引用被引用数: 27 関連記事全 7 バージョン HTMLバージョン

[Free GPT-4]

[PDF] neurips.cc

Combinatorial pure exploration of multi-armed bandits

S Chen, T Lin, I King, MR Lyu… - Advances in neural …, 2014 - proceedings.neurips.cc

We study the {\em combinatorial pure exploration (CPE)} problem in the stochastic multi-
armed bandit setting, where a learner explores a set of arms with the objective of identifying …

保存引用被引用数: 252 関連記事全 11 バージョン HTMLバージョン

[Free GPT-4]

[PDF] mlr.press

A model selection approach for corruption robust reinforcement learning

CY Wei, C Dann, J Zimmert - International Conference on …, 2022 - proceedings.mlr.press

We develop a model selection approach to tackle reinforcement learning with adversarial
corruption in both transition and reward. For finite-horizon tabular MDPs, without prior …

保存引用被引用数: 61 関連記事全 6 バージョン HTMLバージョン

アラートを作成

引用

検索オプション

マイライブラリに保存しました

The Online Loop-free Stochastic Shortest-Path Problem.

Provably efficient exploration in policy optimization

[書籍][B] Algorithms for reinforcement learning

Non-stationary reinforcement learning without prior knowledge: An optimal black-box approach

A unified view of entropy-regularized markov decision processes

Corruption-robust offline reinforcement learning with general function approximation

Learning adversarial markov decision processes with bandit feedback and unknown transition

Online convex optimization in adversarial markov decision processes

Corruption-robust algorithms with uncertainty weighting for nonlinear contextual bandits and markov decision processes

Combinatorial pure exploration of multi-armed bandits

A model selection approach for corruption robust reinforcement learning