- Academic Search

T Lattimore, C Szepesvári - 2020 - books.google.com

Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

保存引用被引用数: 3323 関連記事全 9 バージョン

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Minimax regret bounds for reinforcement learning

MG Azar, I Osband, R Munos - International conference on …, 2017 - proceedings.mlr.press

We consider the problem of provably optimal exploration in reinforcement learning for finite
horizon MDPs. We show that an optimistic modification to value iteration achieves a regret …

保存引用被引用数: 900 関連記事全 5 バージョン HTMLバージョン

[Free GPT-4]
[DeepSeek]

[PDF] quantum-journal.org

[PDF][PDF] Mitigation of readout noise in near-term quantum devices by classical post-processing based on detector tomography

FB Maciejewski, Z Zimborás, M Oszmaniec - Quantum, 2020 - quantum-journal.org

We propose a simple scheme to reduce readout errors in experiments on quantum systems
with finite number of measurement outcomes. Our method relies on performing classical …

保存引用被引用数: 276 関連記事全 9 バージョン HTMLバージョン

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Tighter problem-dependent regret bounds in reinforcement learning without domain knowledge using value function bounds

A Zanette, E Brunskill - International Conference on Machine …, 2019 - proceedings.mlr.press

Strong worst-case performance bounds for episodic reinforcement learning exist but
fortunately in practice RL algorithms perform much better than such bounds would predict …

保存引用被引用数: 319 関連記事全 8 バージョン HTMLバージョン

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Provably efficient rl with rich observations via latent state decoding

S Du, A Krishnamurthy, N Jiang… - International …, 2019 - proceedings.mlr.press

We study the exploration problem in episodic MDPs with rich observations generated from a
small number of latent states. Under certain identifiability assumptions, we demonstrate how …

保存引用被引用数: 282 関連記事全 5 バージョン HTMLバージョン

[Free GPT-4]
[DeepSeek]

[PDF] jmlr.org

Deep exploration via randomized value functions

I Osband, B Van Roy, DJ Russo, Z Wen - Journal of Machine Learning …, 2019 - jmlr.org

We study the use of randomized value functions to guide deep exploration in reinforcement
learning. This offers an elegant means for synthesizing statistically and computationally …

保存引用被引用数: 360 関連記事全 8 バージョン HTMLバージョン

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Unifying PAC and regret: Uniform PAC bounds for episodic reinforcement learning

C Dann, T Lattimore, E Brunskill - Advances in Neural …, 2017 - proceedings.neurips.cc

Statistical performance bounds for reinforcement learning (RL) algorithms can be critical for
high-stakes applications like healthcare. This paper introduces a new framework for …

保存引用被引用数: 339 関連記事全 6 バージョン HTMLバージョン

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Why is posterior sampling better than optimism for reinforcement learning?

I Osband, B Van Roy - International conference on machine …, 2017 - proceedings.mlr.press

Computational results demonstrate that posterior sampling for reinforcement learning
(PSRL) dramatically outperforms existing algorithms driven by optimism, such as UCRL2 …

保存引用被引用数: 286 関連記事全 9 バージョン HTMLバージョン

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Near-optimal regret bounds for reinforcement learning

P Auer, T Jaksch, R Ortner - Advances in neural information …, 2008 - proceedings.neurips.cc

For undiscounted reinforcement learning in Markov decision processes (MDPs) we consider
the total regret of a learning algorithm with respect to an optimal policy. In order to describe …

保存引用被引用数: 1636 関連記事全 19 バージョン HTMLバージョン

[Free GPT-4]
[DeepSeek]

[PDF] jair.org Full View

Graph kernels: A survey

G Nikolentzos, G Siglidis, M Vazirgiannis - Journal of Artificial Intelligence …, 2021 - jair.org

Graph kernels have attracted a lot of attention during the last decade, and have evolved into
a rapidly develo** branch of learning on structured data. During the past 20 years, the …

保存引用被引用数: 169 関連記事全 8 バージョン HTMLバージョン

アラートを作成

引用

検索オプション

マイライブラリに保存しました

Inequalities for the L1 deviation of the empirical distribution

[書籍][B] Bandit algorithms

Minimax regret bounds for reinforcement learning

[PDF][PDF] Mitigation of readout noise in near-term quantum devices by classical post-processing based on detector tomography

Tighter problem-dependent regret bounds in reinforcement learning without domain knowledge using value function bounds

Provably efficient rl with rich observations via latent state decoding

Deep exploration via randomized value functions

Unifying PAC and regret: Uniform PAC bounds for episodic reinforcement learning

Why is posterior sampling better than optimism for reinforcement learning?

Near-optimal regret bounds for reinforcement learning

Graph kernels: A survey