Doubly robust distributionally robust off-policy evaluation and learning
Off-policy evaluation and learning (OPE/L) use offline observational data to make better
decisions, which is crucial in applications where online experimentation is limited. However …
decisions, which is crucial in applications where online experimentation is limited. However …
Policy learning" without''overlap: Pessimism and generalized empirical Bernstein's inequality
This paper studies offline policy learning, which aims at utilizing observations collected a
priori (from either fixed or adaptively evolving behavior policies) to learn the optimal …
priori (from either fixed or adaptively evolving behavior policies) to learn the optimal …
On sample-efficient offline reinforcement learning: Data diversity, posterior sampling and beyond
We seek to understand what facilitates sample-efficient learning from historical datasets for
sequential decision-making, a problem that is popularly known as offline reinforcement …
sequential decision-making, a problem that is popularly known as offline reinforcement …
On instance-dependent bounds for offline reinforcement learning with linear function approximation
Sample-efficient offline reinforcement learning (RL) with linear function approximation has
been studied extensively recently. Much of the prior work has yielded instance-independent …
been studied extensively recently. Much of the prior work has yielded instance-independent …
Distributionally robust batch contextual bandits
Policy learning using historical observational data are an important problem that has
widespread applications. Examples include selecting offers, prices, or advertisements for …
widespread applications. Examples include selecting offers, prices, or advertisements for …
Adaptive linear estimating equations
Sequential data collection has emerged as a widely adopted technique for enhancing the
efficiency of data gathering processes. Despite its advantages, such data collection …
efficiency of data gathering processes. Despite its advantages, such data collection …
Non-stationary representation learning in sequential linear bandits
In this paper, we study representation learning for multi-task decision-making in non-
stationary environments. We consider the framework of sequential linear bandits, where the …
stationary environments. We consider the framework of sequential linear bandits, where the …
Efficient online estimation of causal effects by deciding what to observe
Researchers often face data fusion problems, where multiple data sources are available,
each capturing a distinct subset of variables. While problem formulations typically take the …
each capturing a distinct subset of variables. While problem formulations typically take the …
Risk minimization from adaptively collected data: Guarantees for supervised and policy learning
Empirical risk minimization (ERM) is the workhorse of machine learning, whether for
classification and regression or for off-policy policy learning, but its model-agnostic …
classification and regression or for off-policy policy learning, but its model-agnostic …
Off-policy estimation of linear functionals: Non-asymptotic theory for semi-parametric efficiency
The problem of estimating a linear functional based on observational data is canonical in
both the causal inference and bandit literatures. We analyze a broad class of two-stage …
both the causal inference and bandit literatures. We analyze a broad class of two-stage …