Doubly robust distributionally robust off-policy evaluation and learning

N Kallus, X Mao, K Wang… - … Conference on Machine …, 2022 - proceedings.mlr.press
Off-policy evaluation and learning (OPE/L) use offline observational data to make better
decisions, which is crucial in applications where online experimentation is limited. However …

Policy learning" without''overlap: Pessimism and generalized empirical Bernstein's inequality

Y **, Z Ren, Z Yang, Z Wang - arxiv preprint arxiv:2212.09900, 2022 - arxiv.org
This paper studies offline policy learning, which aims at utilizing observations collected a
priori (from either fixed or adaptively evolving behavior policies) to learn the optimal …

On sample-efficient offline reinforcement learning: Data diversity, posterior sampling and beyond

T Nguyen-Tang, R Arora - Advances in neural information …, 2024 - proceedings.neurips.cc
We seek to understand what facilitates sample-efficient learning from historical datasets for
sequential decision-making, a problem that is popularly known as offline reinforcement …

On instance-dependent bounds for offline reinforcement learning with linear function approximation

T Nguyen-Tang, M Yin, S Gupta, S Venkatesh… - Proceedings of the …, 2023 - ojs.aaai.org
Sample-efficient offline reinforcement learning (RL) with linear function approximation has
been studied extensively recently. Much of the prior work has yielded instance-independent …

Distributionally robust batch contextual bandits

N Si, F Zhang, Z Zhou, J Blanchet - Management Science, 2023 - pubsonline.informs.org
Policy learning using historical observational data are an important problem that has
widespread applications. Examples include selecting offers, prices, or advertisements for …

Adaptive linear estimating equations

M Ying, K Khamaru, CH Zhang - Advances in Neural …, 2023 - proceedings.neurips.cc
Sequential data collection has emerged as a widely adopted technique for enhancing the
efficiency of data gathering processes. Despite its advantages, such data collection …

Non-stationary representation learning in sequential linear bandits

Y Qin, T Menara, S Oymak, SN Ching… - IEEE Open Journal of …, 2022 - ieeexplore.ieee.org
In this paper, we study representation learning for multi-task decision-making in non-
stationary environments. We consider the framework of sequential linear bandits, where the …

Efficient online estimation of causal effects by deciding what to observe

S Gupta, Z Lipton, D Childers - Advances in Neural …, 2021 - proceedings.neurips.cc
Researchers often face data fusion problems, where multiple data sources are available,
each capturing a distinct subset of variables. While problem formulations typically take the …

Risk minimization from adaptively collected data: Guarantees for supervised and policy learning

A Bibaut, N Kallus, M Dimakopoulou… - Advances in neural …, 2021 - proceedings.neurips.cc
Empirical risk minimization (ERM) is the workhorse of machine learning, whether for
classification and regression or for off-policy policy learning, but its model-agnostic …

Off-policy estimation of linear functionals: Non-asymptotic theory for semi-parametric efficiency

W Mou, MJ Wainwright, PL Bartlett - arxiv preprint arxiv:2209.13075, 2022 - arxiv.org
The problem of estimating a linear functional based on observational data is canonical in
both the causal inference and bandit literatures. We analyze a broad class of two-stage …