Unpacking reward sha**: Understanding the benefits of reward engineering on sample complexity

A Gupta, A Pacchiano, Y Zhai… - Advances in Neural …, 2022 - proceedings.neurips.cc
The success of reinforcement learning in a variety of challenging sequential decision-
making problems has been much discussed, but often ignored in this discussion is the …

Model selection in contextual stochastic bandit problems

A Pacchiano, M Phan… - Advances in …, 2020 - proceedings.neurips.cc
We study bandit model selection in stochastic environments. Our approach relies on a
master algorithm that selects between candidate base algorithms. We develop a master …

Learning in pomdps is sample-efficient with hindsight observability

J Lee, A Agarwal, C Dann… - … Conference on Machine …, 2023 - proceedings.mlr.press
POMDPs capture a broad class of decision making problems, but hardness results suggest
that learning is intractable even in simple settings due to the inherent partial observability …

A model selection approach for corruption robust reinforcement learning

CY Wei, C Dann, J Zimmert - International Conference on …, 2022 - proceedings.mlr.press
We develop a model selection approach to tackle reinforcement learning with adversarial
corruption in both transition and reward. For finite-horizon tabular MDPs, without prior …

A blackbox approach to best of both worlds in bandits and beyond

C Dann, CY Wei, J Zimmert - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
Best-of-both-worlds algorithms for online learning which achieve near-optimal regret in both
the adversarial and the stochastic regimes have received growing attention recently …

Provable benefits of representational transfer in reinforcement learning

A Agarwal, Y Song, W Sun, K Wang… - The Thirty Sixth …, 2023 - proceedings.mlr.press
We study the problem of representational transfer in RL, where an agent first pretrains in a
number of\emph {source tasks} to discover a shared representation, which is subsequently …

Reinforcement learning can be more efficient with multiple rewards

C Dann, Y Mansour, M Mohri - International Conference on …, 2023 - proceedings.mlr.press
Reward design is one of the most critical and challenging aspects when formulating a task
as a reinforcement learning (RL) problem. In practice, it often takes several attempts of …

Best of both worlds model selection

A Pacchiano, C Dann, C Gentile - Advances in Neural …, 2022 - proceedings.neurips.cc
We study the problem of model selection in bandit scenarios in the presence of nested
policy classes, with the goal of obtaining simultaneous adversarial and stochastic (``best of …

Experiment planning with function approximation

A Pacchiano, J Lee, E Brunskill - Advances in Neural …, 2024 - proceedings.neurips.cc
We study the problem of experiment planning with function approximation in contextual
bandit problems. In settings where there is a significant overhead to deploying adaptive …

Decentralized cooperative reinforcement learning with hierarchical information structure

H Kao, CY Wei, V Subramanian - … Conference on Algorithmic …, 2022 - proceedings.mlr.press
Multi-agent reinforcement learning (MARL) problems are challenging due to information
asymmetry. To overcome this challenge, existing methods often require high level of …