- Academic Search

N Kallus, X Mao, K Wang… - … Conference on Machine …, 2022 - proceedings.mlr.press

Off-policy evaluation and learning (OPE/L) use offline observational data to make better
decisions, which is crucial in applications where online experimentation is limited. However …

Speichern Zitieren Zitiert von: 42 Ähnliche Artikel Alle 5 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Policy learning" without''overlap: Pessimism and generalized empirical Bernstein's inequality

Y **, Z Ren, Z Yang, Z Wang - arxiv preprint arxiv:2212.09900, 2022 - arxiv.org

This paper studies offline policy learning, which aims at utilizing observations collected a
priori (from either fixed or adaptively evolving behavior policies) to learn the optimal …

Speichern Zitieren Zitiert von: 28 Ähnliche Artikel Alle 8 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

On sample-efficient offline reinforcement learning: Data diversity, posterior sampling and beyond

T Nguyen-Tang, R Arora - Advances in neural information …, 2024 - proceedings.neurips.cc

We seek to understand what facilitates sample-efficient learning from historical datasets for
sequential decision-making, a problem that is popularly known as offline reinforcement …

Speichern Zitieren Zitiert von: 8 Ähnliche Artikel Alle 7 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

On instance-dependent bounds for offline reinforcement learning with linear function approximation

T Nguyen-Tang, M Yin, S Gupta, S Venkatesh… - Proceedings of the …, 2023 - ojs.aaai.org

Sample-efficient offline reinforcement learning (RL) with linear function approximation has
been studied extensively recently. Much of the prior work has yielded instance-independent …

Speichern Zitieren Zitiert von: 21 Ähnliche Artikel Alle 7 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Distributionally robust batch contextual bandits

N Si, F Zhang, Z Zhou, J Blanchet - Management Science, 2023 - pubsonline.informs.org

Policy learning using historical observational data are an important problem that has
widespread applications. Examples include selecting offers, prices, or advertisements for …

Speichern Zitieren Zitiert von: 43 Ähnliche Artikel Alle 8 Versionen

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Adaptive linear estimating equations

M Ying, K Khamaru, CH Zhang - Advances in Neural …, 2023 - proceedings.neurips.cc

Sequential data collection has emerged as a widely adopted technique for enhancing the
efficiency of data gathering processes. Despite its advantages, such data collection …

Speichern Zitieren Zitiert von: 4 Ähnliche Artikel Alle 11 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Non-stationary representation learning in sequential linear bandits

Y Qin, T Menara, S Oymak, SN Ching… - IEEE Open Journal of …, 2022 - ieeexplore.ieee.org

In this paper, we study representation learning for multi-task decision-making in non-
stationary environments. We consider the framework of sequential linear bandits, where the …

Speichern Zitieren Zitiert von: 17 Ähnliche Artikel Alle 4 Versionen

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Efficient online estimation of causal effects by deciding what to observe

S Gupta, Z Lipton, D Childers - Advances in Neural …, 2021 - proceedings.neurips.cc

Researchers often face data fusion problems, where multiple data sources are available,
each capturing a distinct subset of variables. While problem formulations typically take the …

Speichern Zitieren Zitiert von: 16 Ähnliche Artikel Alle 8 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Risk minimization from adaptively collected data: Guarantees for supervised and policy learning

A Bibaut, N Kallus, M Dimakopoulou… - Advances in neural …, 2021 - proceedings.neurips.cc

Empirical risk minimization (ERM) is the workhorse of machine learning, whether for
classification and regression or for off-policy policy learning, but its model-agnostic …

Speichern Zitieren Zitiert von: 18 Ähnliche Artikel Alle 17 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Off-policy estimation of linear functionals: Non-asymptotic theory for semi-parametric efficiency

W Mou, MJ Wainwright, PL Bartlett - arxiv preprint arxiv:2209.13075, 2022 - arxiv.org

The problem of estimating a linear functional based on observational data is canonical in
both the causal inference and bandit literatures. We analyze a broad class of two-stage …

Speichern Zitieren Zitiert von: 11 Ähnliche Artikel Alle 2 Versionen HTML-Version

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

Policy learning with adaptively collected data

Doubly robust distributionally robust off-policy evaluation and learning

Policy learning" without''overlap: Pessimism and generalized empirical Bernstein's inequality

On sample-efficient offline reinforcement learning: Data diversity, posterior sampling and beyond

On instance-dependent bounds for offline reinforcement learning with linear function approximation

Distributionally robust batch contextual bandits

Adaptive linear estimating equations

Non-stationary representation learning in sequential linear bandits

Efficient online estimation of causal effects by deciding what to observe

Risk minimization from adaptively collected data: Guarantees for supervised and policy learning

Off-policy estimation of linear functionals: Non-asymptotic theory for semi-parametric efficiency