Near-minimax-optimal risk-sensitive reinforcement learning with cvar

K Wang, N Kallus, W Sun - International Conference on …, 2023‏ - proceedings.mlr.press
In this paper, we study risk-sensitive Reinforcement Learning (RL), focusing on the objective
of Conditional Value at Risk (CVaR) with risk tolerance $\tau $. Starting with multi-arm …

The benefits of being distributional: Small-loss bounds for reinforcement learning

K Wang, K Zhou, R Wu, N Kallus… - Advances in neural …, 2023‏ - proceedings.neurips.cc
While distributional reinforcement learning (DistRL) has been empirically effective, the
question of when and why it is better than vanilla, non-distributional RL has remained …

Distributionally robust policy gradient for offline contextual bandits

Z Yang, Y Guo, P Xu, A Liu… - International …, 2023‏ - proceedings.mlr.press
Learning an optimal policy from offline data is notoriously challenging, which requires the
evaluation of the learning policy using data pre-collected from a static logging policy. We …

The central role of the loss function in reinforcement learning

K Wang, N Kallus, W Sun - arxiv preprint arxiv:2409.12799, 2024‏ - arxiv.org
This paper illustrates the central role of loss functions in data-driven decision making,
providing a comprehensive survey on their influence in cost-sensitive classification (CSC) …

Policy learning under biased sample selection

L Lei, R Sahoo, S Wager - arxiv preprint arxiv:2304.11735, 2023‏ - arxiv.org
Practitioners often use data from a randomized controlled trial to learn a treatment
assignment policy that can be deployed on a target population. A recurring concern in doing …

Policy learning for localized interventions from observational data

MG Marmarelis, F Morstatter… - International …, 2024‏ - proceedings.mlr.press
A largely unaddressed problem in causal inference is that of learning reliable policies in
continuous, high-dimensional treatment variables from observational data. Especially in the …

Uncertainty-aware instance reweighting for off-policy learning

X Zhang, J Chen, H Wang, H **e… - Advances in Neural …, 2023‏ - proceedings.neurips.cc
Off-policy learning, referring to the procedure of policy optimization with access only to
logged feedback data, has shown importance in various important real-world applications …

Online policy optimization for robust mdp

J Dong, J Li, B Wang, J Zhang - arxiv preprint arxiv:2209.13841, 2022‏ - arxiv.org
Reinforcement learning (RL) has exceeded human performance in many synthetic settings
such as video games and Go. However, real-world deployment of end-to-end RL models is …

Distributional shift-aware off-policy interval estimation: A unified error quantification framework

W Zhou, Y Li, R Zhu, A Qu - arxiv preprint arxiv:2309.13278, 2023‏ - arxiv.org
We study high-confidence off-policy evaluation in the context of infinite-horizon Markov
decision processes, where the objective is to establish a confidence interval (CI) for the …

Provable risk-sensitive distributional reinforcement learning with general function approximation

Y Chen, X Zhang, S Wang, L Huang - arxiv preprint arxiv:2402.18159, 2024‏ - arxiv.org
In the realm of reinforcement learning (RL), accounting for risk is crucial for making
decisions under uncertainty, particularly in applications where safety and reliability are …