[書籍][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

Minimax regret bounds for reinforcement learning

MG Azar, I Osband, R Munos - International conference on …, 2017 - proceedings.mlr.press
We consider the problem of provably optimal exploration in reinforcement learning for finite
horizon MDPs. We show that an optimistic modification to value iteration achieves a regret …

[PDF][PDF] Mitigation of readout noise in near-term quantum devices by classical post-processing based on detector tomography

FB Maciejewski, Z Zimborás, M Oszmaniec - Quantum, 2020 - quantum-journal.org
We propose a simple scheme to reduce readout errors in experiments on quantum systems
with finite number of measurement outcomes. Our method relies on performing classical …

Tighter problem-dependent regret bounds in reinforcement learning without domain knowledge using value function bounds

A Zanette, E Brunskill - International Conference on Machine …, 2019 - proceedings.mlr.press
Strong worst-case performance bounds for episodic reinforcement learning exist but
fortunately in practice RL algorithms perform much better than such bounds would predict …

Provably efficient rl with rich observations via latent state decoding

S Du, A Krishnamurthy, N Jiang… - International …, 2019 - proceedings.mlr.press
We study the exploration problem in episodic MDPs with rich observations generated from a
small number of latent states. Under certain identifiability assumptions, we demonstrate how …

Deep exploration via randomized value functions

I Osband, B Van Roy, DJ Russo, Z Wen - Journal of Machine Learning …, 2019 - jmlr.org
We study the use of randomized value functions to guide deep exploration in reinforcement
learning. This offers an elegant means for synthesizing statistically and computationally …

Unifying PAC and regret: Uniform PAC bounds for episodic reinforcement learning

C Dann, T Lattimore, E Brunskill - Advances in Neural …, 2017 - proceedings.neurips.cc
Statistical performance bounds for reinforcement learning (RL) algorithms can be critical for
high-stakes applications like healthcare. This paper introduces a new framework for …

Why is posterior sampling better than optimism for reinforcement learning?

I Osband, B Van Roy - International conference on machine …, 2017 - proceedings.mlr.press
Computational results demonstrate that posterior sampling for reinforcement learning
(PSRL) dramatically outperforms existing algorithms driven by optimism, such as UCRL2 …

Near-optimal regret bounds for reinforcement learning

P Auer, T Jaksch, R Ortner - Advances in neural information …, 2008 - proceedings.neurips.cc
For undiscounted reinforcement learning in Markov decision processes (MDPs) we consider
the total regret of a learning algorithm with respect to an optimal policy. In order to describe …

Graph kernels: A survey

G Nikolentzos, G Siglidis, M Vazirgiannis - Journal of Artificial Intelligence …, 2021 - jair.org
Graph kernels have attracted a lot of attention during the last decade, and have evolved into
a rapidly develo** branch of learning on structured data. During the past 20 years, the …