- Academic Search

DJ Foster, SM Kakade, J Qian, A Rakhlin - arxiv preprint arxiv:2112.13487, 2021 - arxiv.org

A fundamental challenge in interactive learning and decision making, ranging from bandit
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …

Opslaan Citeren Geciteerd door 205 Verwante artikelen Alle 6 versies HTML-versie

[Free GPT-4]

[PDF] neurips.cc

Bellman eluder dimension: New rich classes of rl problems, and sample-efficient algorithms

C **, Q Liu, S Miryoosefi - Advances in neural information …, 2021 - proceedings.neurips.cc

Finding the minimal structural assumptions that empower sample-efficient learning is one of
the most important research directions in Reinforcement Learning (RL). This paper …

Opslaan Citeren Geciteerd door 264 Verwante artikelen Alle 11 versies HTML-versie

[Free GPT-4]

[PDF] tor-lattimore.com

[BOEK][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com

Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

Opslaan Citeren Geciteerd door 3285 Verwante artikelen Alle 9 versies In bibliotheek zoeken

[Free GPT-4]

[PDF] arxiv.org

Representation learning for online and offline rl in low-rank mdps

M Uehara, X Zhang, W Sun - arxiv preprint arxiv:2110.04652, 2021 - arxiv.org

This work studies the question of Representation Learning in RL: how can we learn a
compact low-dimensional representation such that on top of the representation we can …

Opslaan Citeren Geciteerd door 157 Verwante artikelen Alle 3 versies HTML-versie

[Free GPT-4]

[PDF] arxiv.org

Top-k off-policy correction for a REINFORCE recommender system

M Chen, A Beutel, P Covington, S Jain… - Proceedings of the …, 2019 - dl.acm.org

Industrial recommender systems deal with extremely large action spaces--many millions of
items to recommend. Moreover, they need to serve billions of users, who are unique at any …

Opslaan Citeren Geciteerd door 544 Verwante artikelen Alle 10 versies

[Free GPT-4]

[PDF] neurips.cc

Flambe: Structural complexity and representation learning of low rank mdps

A Agarwal, S Kakade… - Advances in neural …, 2020 - proceedings.neurips.cc

In order to deal with the curse of dimensionality in reinforcement learning (RL), it is common
practice to make parametric assumptions where values or policies are functions of some low …

Opslaan Citeren Geciteerd door 296 Verwante artikelen Alle 10 versies HTML-versie

[Free GPT-4]

[PDF] nowpublishers.com

Introduction to multi-armed bandits

A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com

Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …

Opslaan Citeren Geciteerd door 1253 Verwante artikelen Alle 7 versies In bibliotheek zoeken HTML-versie

[Free GPT-4]

[PDF] datascienceassn.org

Concrete problems in AI safety

D Amodei, C Olah, J Steinhardt, P Christiano… - arxiv preprint arxiv …, 2016 - arxiv.org

Rapid progress in machine learning and artificial intelligence (AI) has brought increasing
attention to the potential impacts of AI technologies on society. In this paper we discuss one …

Opslaan Citeren Geciteerd door 3130 Verwante artikelen Alle 9 versies HTML-versie

[Free GPT-4]

[PDF] abracadoudou.com

A study on overfitting in deep reinforcement learning

C Zhang, O Vinyals, R Munos, S Bengio - arxiv preprint arxiv:1804.06893, 2018 - arxiv.org

Recent years have witnessed significant progresses in deep Reinforcement Learning (RL).
Empowered with large scale neural networks, carefully designed architectures, novel …

Opslaan Citeren Geciteerd door 510 Verwante artikelen Alle 6 versies HTML-versie

[Free GPT-4]

[PDF] mlr.press

Neural contextual bandits with ucb-based exploration

D Zhou, L Li, Q Gu - International Conference on Machine …, 2020 - proceedings.mlr.press

We study the stochastic contextual bandit problem, where the reward is generated from an
unknown function with additive noise. No assumption is made about the reward function …

Opslaan Citeren Geciteerd door 294 Verwante artikelen Alle 10 versies HTML-versie

Melding maken

Citeren

Geavanceerd zoeken

Opgeslagen in Mijn bibliotheek

Taming the monster: A fast and simple algorithm for contextual bandits

The statistical complexity of interactive decision making

Bellman eluder dimension: New rich classes of rl problems, and sample-efficient algorithms

[BOEK][B] Bandit algorithms

Representation learning for online and offline rl in low-rank mdps

Top-k off-policy correction for a REINFORCE recommender system

Flambe: Structural complexity and representation learning of low rank mdps

Introduction to multi-armed bandits

Concrete problems in AI safety

A study on overfitting in deep reinforcement learning

Neural contextual bandits with ucb-based exploration