The statistical complexity of interactive decision making

DJ Foster, SM Kakade, J Qian, A Rakhlin - arxiv preprint arxiv:2112.13487, 2021 - arxiv.org
A fundamental challenge in interactive learning and decision making, ranging from bandit
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …

On the Convergence and Sample Complexity Analysis of Deep Q-Networks with -Greedy Exploration

S Zhang, H Li, M Wang, M Liu… - Advances in …, 2023 - proceedings.neurips.cc
This paper provides a theoretical understanding of deep Q-Network (DQN) with the
$\varepsilon $-greedy exploration in deep reinforcement learning. Despite the tremendous …

The sample complexity of online contract design

B Zhu, S Bates, Z Yang, Y Wang, J Jiao… - arxiv preprint arxiv …, 2022 - arxiv.org
We study the hidden-action principal-agent problem in an online setting. In each round, the
principal posts a contract that specifies the payment to the agent based on each outcome …

Online learning in stackelberg games with an omniscient follower

G Zhao, B Zhu, J Jiao, M Jordan - … Conference on Machine …, 2023 - proceedings.mlr.press
We study the problem of online learning in a two-player decentralized cooperative
Stackelberg game. In each round, the leader first takes an action, followed by the follower …

Made: Exploration via maximizing deviation from explored regions

T Zhang, P Rashidinejad, J Jiao… - Advances in …, 2021 - proceedings.neurips.cc
In online reinforcement learning (RL), efficient exploration remains particularly challenging
in high-dimensional environments with sparse rewards. In low-dimensional environments …

Understanding Deep Neural Function Approximation in Reinforcement Learning via -Greedy Exploration

F Liu, L Viano, V Cevher - Advances in Neural Information …, 2022 - proceedings.neurips.cc
This paper provides a theoretical study of deep neural function approximation in
reinforcement learning (RL) with the $\epsilon $-greedy exploration under the online setting …

First steps toward understanding the extrapolation of nonlinear models to unseen domains

K Dong, T Ma - arxiv preprint arxiv:2211.11719, 2022 - arxiv.org
Real-world machine learning applications often involve deploying neural networks to
domains that are not seen in the training time. Hence, we need to understand the …

Lifting the information ratio: An information-theoretic analysis of thompson sampling for contextual bandits

G Neu, I Olkhovskaia, M Papini… - Advances in Neural …, 2022 - proceedings.neurips.cc
We study the Bayesian regret of the renowned Thompson Sampling algorithm in contextual
bandits with binary losses and adversarially-selected contexts. We adapt the information …

Fast rates for nonparametric online learning: from realizability to learning in games

C Daskalakis, N Golowich - Proceedings of the 54th Annual ACM …, 2022 - dl.acm.org
We study fast rates of convergence in the setting of nonparametric online regression, namely
where regret is defined with respect to an arbitrary function class which has bounded …

Representation learning beyond linear prediction functions

Z Xu, A Tewari - Advances in Neural Information Processing …, 2021 - proceedings.neurips.cc
Recent papers on the theory of representation learning has shown the importance of a
quantity called diversity when generalizing from a set of source tasks to a target task. Most of …