- Academic Search

A Gupta, A Pacchiano, Y Zhai… - Advances in Neural …, 2022 - proceedings.neurips.cc

The success of reinforcement learning in a variety of challenging sequential decision-
making problems has been much discussed, but often ignored in this discussion is the …

Lagre Referanse Sitert av 70 Beslektede artikler Alle 9 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Model selection in contextual stochastic bandit problems

A Pacchiano, M Phan… - Advances in …, 2020 - proceedings.neurips.cc

We study bandit model selection in stochastic environments. Our approach relies on a
master algorithm that selects between candidate base algorithms. We develop a master …

Lagre Referanse Sitert av 115 Beslektede artikler Alle 7 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Learning in pomdps is sample-efficient with hindsight observability

J Lee, A Agarwal, C Dann… - … Conference on Machine …, 2023 - proceedings.mlr.press

POMDPs capture a broad class of decision making problems, but hardness results suggest
that learning is intractable even in simple settings due to the inherent partial observability …

Lagre Referanse Sitert av 26 Beslektede artikler Alle 9 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

A model selection approach for corruption robust reinforcement learning

CY Wei, C Dann, J Zimmert - International Conference on …, 2022 - proceedings.mlr.press

We develop a model selection approach to tackle reinforcement learning with adversarial
corruption in both transition and reward. For finite-horizon tabular MDPs, without prior …

Lagre Referanse Sitert av 64 Beslektede artikler Alle 5 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

A blackbox approach to best of both worlds in bandits and beyond

C Dann, CY Wei, J Zimmert - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press

Best-of-both-worlds algorithms for online learning which achieve near-optimal regret in both
the adversarial and the stochastic regimes have received growing attention recently …

Lagre Referanse Sitert av 25 Beslektede artikler Alle 4 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Provable benefits of representational transfer in reinforcement learning

A Agarwal, Y Song, W Sun, K Wang… - The Thirty Sixth …, 2023 - proceedings.mlr.press

We study the problem of representational transfer in RL, where an agent first pretrains in a
number of\emph {source tasks} to discover a shared representation, which is subsequently …

Lagre Referanse Sitert av 33 Beslektede artikler Alle 8 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Reinforcement learning can be more efficient with multiple rewards

C Dann, Y Mansour, M Mohri - International Conference on …, 2023 - proceedings.mlr.press

Reward design is one of the most critical and challenging aspects when formulating a task
as a reinforcement learning (RL) problem. In practice, it often takes several attempts of …

Lagre Referanse Sitert av 13 Beslektede artikler Alle 6 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Best of both worlds model selection

A Pacchiano, C Dann, C Gentile - Advances in Neural …, 2022 - proceedings.neurips.cc

We study the problem of model selection in bandit scenarios in the presence of nested
policy classes, with the goal of obtaining simultaneous adversarial and stochastic (``best of …

Lagre Referanse Sitert av 16 Beslektede artikler Alle 5 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Experiment planning with function approximation

A Pacchiano, J Lee, E Brunskill - Advances in Neural …, 2024 - proceedings.neurips.cc

We study the problem of experiment planning with function approximation in contextual
bandit problems. In settings where there is a significant overhead to deploying adaptive …

Lagre Referanse Sitert av 5 Beslektede artikler Alle 6 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Decentralized cooperative reinforcement learning with hierarchical information structure

H Kao, CY Wei, V Subramanian - … Conference on Algorithmic …, 2022 - proceedings.mlr.press

Multi-agent reinforcement learning (MARL) problems are challenging due to information
asymmetry. To overcome this challenge, existing methods often require high level of …

Lagre Referanse Sitert av 21 Beslektede artikler Alle 4 versjoner HTML-versjon

Opprett varsel

Referanse

Avansert søk

Lagret i Mitt bibliotek

Dynamic balancing for model selection in bandits and rl

Unpacking reward sha**: Understanding the benefits of reward engineering on sample complexity

Model selection in contextual stochastic bandit problems

Learning in pomdps is sample-efficient with hindsight observability

A model selection approach for corruption robust reinforcement learning

A blackbox approach to best of both worlds in bandits and beyond

Provable benefits of representational transfer in reinforcement learning

Reinforcement learning can be more efficient with multiple rewards

Best of both worlds model selection

Experiment planning with function approximation

Decentralized cooperative reinforcement learning with hierarchical information structure