Google Наука

Turnitin 降AI改写早检测系统早降重系统 Turnitin-UK版万方检测-期刊版维普编辑部版 Grammarly检测 Paperpass检测 checkpass检测 PaperYY检测

A survey of meta-reinforcement learning

J Beck, R Vuorio, EZ Liu, Z **ong, L Zintgraf… - arxiv preprint arxiv …, 2023 - arxiv.org

While deep reinforcement learning (RL) has fueled multiple high-profile successes in
machine learning, it is held back from more widespread adoption by its often poor data …

Запазване Позоваване С позовавания в 179 Сродни статии Всички 2 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Efficient reinforcement learning in block mdps: A model-free representation learning approach

X Zhang, Y Song, M Uehara, M Wang… - International …, 2022 - proceedings.mlr.press

We present BRIEE, an algorithm for efficient reinforcement learning in Markov Decision
Processes with block-structured dynamics (ie, Block MDPs), where rich observations are …

Запазване Позоваване С позовавания в 75 Сродни статии Всички 4 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Bayesian decision-making under misspecified priors with applications to meta-learning

M Simchowitz, C Tosh… - Advances in neural …, 2021 - proceedings.neurips.cc

Thompson sampling and other Bayesian sequential decision-making algorithms are among
the most popular approaches to tackle explore/exploit trade-offs in (contextual) bandits. The …

Запазване Позоваване С позовавания в 60 Сродни статии Всички 7 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Meta-thompson sampling

B Kveton, M Konobeev, M Zaheer… - International …, 2021 - proceedings.mlr.press

Efficient exploration in bandits is a fundamental online learning problem. We propose a
variant of Thompson sampling that learns to explore better as it interacts with bandit …

Запазване Позоваване С позовавания в 81 Сродни статии Всички 9 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Offline multi-task transfer rl with representational penalization

A Bose, SS Du, M Fazel - arxiv preprint arxiv:2402.12570, 2024 - arxiv.org

We study the problem of representation transfer in offline Reinforcement Learning (RL),
where a learner has access to episodic data from a number of source tasks collected a …

Запазване Позоваване С позовавания в 14 Сродни статии Всички 3 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Provable benefits of representational transfer in reinforcement learning

A Agarwal, Y Song, W Sun, K Wang… - The Thirty Sixth …, 2023 - proceedings.mlr.press

We study the problem of representational transfer in RL, where an agent first pretrains in a
number of\emph {source tasks} to discover a shared representation, which is subsequently …

Запазване Позоваване С позовавания в 33 Сродни статии Всички 8 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Hierarchical bayesian bandits

J Hong, B Kveton, M Zaheer… - International …, 2022 - proceedings.mlr.press

Abstract Meta-, multi-task, and federated learning can be all viewed as solving similar tasks,
drawn from a distribution that reflects task similarities. We provide a unified view of all these …

Запазване Позоваване С позовавания в 49 Сродни статии Всички 4 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Provable benefit of multitask representation learning in reinforcement learning

Y Cheng, S Feng, J Yang, H Zhang… - Advances in Neural …, 2022 - proceedings.neurips.cc

As representation learning becomes a powerful technique to reduce sample complexity in
reinforcement learning (RL) in practice, theoretical understanding of its advantage is still …

Запазване Позоваване С позовавания в 25 Сродни статии Всички 6 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Probabilistic design of optimal sequential decision-making algorithms in learning and control

É Garrabé, G Russo - Annual Reviews in Control, 2022 - Elsevier

This survey is focused on certain sequential decision-making problems that involve
optimizing over probability functions. We discuss the relevance of these problems for …

Запазване Позоваване С позовавания в 11 Сродни статии Всички 5 версии

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

No regrets for learning the prior in bandits

S Basu, B Kveton, M Zaheer… - Advances in neural …, 2021 - proceedings.neurips.cc

Abstract We propose AdaTS, a Thompson sampling algorithm that adapts sequentially to
bandit tasks that it interacts with. The key idea in AdaTS is to adapt to an unknown task prior …

Запазване Позоваване С позовавания в 40 Сродни статии Всички 7 версии Във вид на HTML

Позоваване

Разширено търсене

Запазено в „Моята библиотека“

A survey of meta-reinforcement learning

Efficient reinforcement learning in block mdps: A model-free representation learning approach

Bayesian decision-making under misspecified priors with applications to meta-learning

Meta-thompson sampling

Offline multi-task transfer rl with representational penalization

Provable benefits of representational transfer in reinforcement learning

Hierarchical bayesian bandits

Provable benefit of multitask representation learning in reinforcement learning

Probabilistic design of optimal sequential decision-making algorithms in learning and control

No regrets for learning the prior in bandits