The statistical complexity of interactive decision making
A fundamental challenge in interactive learning and decision making, ranging from bandit
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …
Bellman eluder dimension: New rich classes of rl problems, and sample-efficient algorithms
Finding the minimal structural assumptions that empower sample-efficient learning is one of
the most important research directions in Reinforcement Learning (RL). This paper …
the most important research directions in Reinforcement Learning (RL). This paper …
[BOEK][B] Bandit algorithms
T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …
and the multi-armed bandit model is a commonly used framework to address it. This …
Representation learning for online and offline rl in low-rank mdps
This work studies the question of Representation Learning in RL: how can we learn a
compact low-dimensional representation such that on top of the representation we can …
compact low-dimensional representation such that on top of the representation we can …
Top-k off-policy correction for a REINFORCE recommender system
Industrial recommender systems deal with extremely large action spaces--many millions of
items to recommend. Moreover, they need to serve billions of users, who are unique at any …
items to recommend. Moreover, they need to serve billions of users, who are unique at any …
Flambe: Structural complexity and representation learning of low rank mdps
In order to deal with the curse of dimensionality in reinforcement learning (RL), it is common
practice to make parametric assumptions where values or policies are functions of some low …
practice to make parametric assumptions where values or policies are functions of some low …
Introduction to multi-armed bandits
A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …
decisions over time under uncertainty. An enormous body of work has accumulated over the …
Concrete problems in AI safety
Rapid progress in machine learning and artificial intelligence (AI) has brought increasing
attention to the potential impacts of AI technologies on society. In this paper we discuss one …
attention to the potential impacts of AI technologies on society. In this paper we discuss one …
A study on overfitting in deep reinforcement learning
Recent years have witnessed significant progresses in deep Reinforcement Learning (RL).
Empowered with large scale neural networks, carefully designed architectures, novel …
Empowered with large scale neural networks, carefully designed architectures, novel …
Neural contextual bandits with ucb-based exploration
We study the stochastic contextual bandit problem, where the reward is generated from an
unknown function with additive noise. No assumption is made about the reward function …
unknown function with additive noise. No assumption is made about the reward function …