Dive into deep learning
This open-source book represents our attempt to make deep learning approachable,
teaching readers the concepts, the context, and the code. The entire book is drafted in …
teaching readers the concepts, the context, and the code. The entire book is drafted in …
Beyond ucb: Optimal and efficient contextual bandits with regression oracles
A fundamental challenge in contextual bandits is to develop flexible, general-purpose
algorithms with computational requirements no worse than classical supervised learning …
algorithms with computational requirements no worse than classical supervised learning …
Sample complexity of reinforcement learning using linearly combined model ensembles
Reinforcement learning (RL) methods have been shown to be capable of learning intelligent
behavior in rich domains. However, this has largely been done in simulated domains without …
behavior in rich domains. However, this has largely been done in simulated domains without …
Adapting to misspecification in contextual bandits
A major research direction in contextual bandits is to develop algorithms that are
computationally efficient, yet support flexible, general-purpose function approximation …
computationally efficient, yet support flexible, general-purpose function approximation …
A model selection approach for corruption robust reinforcement learning
We develop a model selection approach to tackle reinforcement learning with adversarial
corruption in both transition and reward. For finite-horizon tabular MDPs, without prior …
corruption in both transition and reward. For finite-horizon tabular MDPs, without prior …
Model selection in contextual stochastic bandit problems
A Pacchiano, M Phan… - Advances in …, 2020 - proceedings.neurips.cc
We study bandit model selection in stochastic environments. Our approach relies on a
master algorithm that selects between candidate base algorithms. We develop a master …
master algorithm that selects between candidate base algorithms. We develop a master …
Hedging the drift: Learning to optimize under nonstationarity
We introduce data-driven decision-making algorithms that achieve state-of-the-art dynamic
regret bounds for a collection of nonstationary stochastic bandit settings. These settings …
regret bounds for a collection of nonstationary stochastic bandit settings. These settings …
Oracle inequalities for model selection in offline reinforcement learning
In offline reinforcement learning (RL), a learner leverages prior logged data to learn a good
policy without interacting with the environment. A major challenge in applying such methods …
policy without interacting with the environment. A major challenge in applying such methods …
Dynamic balancing for model selection in bandits and rl
We propose a framework for model selection by combining base algorithms in stochastic
bandits and reinforcement learning. We require a candidate regret bound for each base …
bandits and reinforcement learning. We require a candidate regret bound for each base …
Artificial intelligence for materials research at extremes
Materials development is slow and expensive, taking decades from inception to fielding. For
materials research at extremes, the situation is even more demanding, as the desired …
materials research at extremes, the situation is even more demanding, as the desired …