محقق Google

K Wang, N Kallus, W Sun - International Conference on …, 2023‏ - proceedings.mlr.press‏

In this paper, we study risk-sensitive Reinforcement Learning (RL), focusing on the objective
of Conditional Value at Risk (CVaR) with risk tolerance $\tau $. Starting with multi-arm …‏

ذخیره ارجاع بیان شده در 23 یافته مقاله‌های مربوط تمام نسخه‌های 8 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

The benefits of being distributional: Small-loss bounds for reinforcement learning‏

K Wang, K Zhou, R Wu, N Kallus… - Advances in neural …, 2023‏ - proceedings.neurips.cc‏

While distributional reinforcement learning (DistRL) has been empirically effective, the
question of when and why it is better than vanilla, non-distributional RL has remained …‏

ذخیره ارجاع بیان شده در 20 یافته مقاله‌های مربوط تمام نسخه‌های 8 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Distributionally robust policy gradient for offline contextual bandits‏

Z Yang, Y Guo, P Xu, A Liu… - International …, 2023‏ - proceedings.mlr.press‏

Learning an optimal policy from offline data is notoriously challenging, which requires the
evaluation of the learning policy using data pre-collected from a static logging policy. We …‏

ذخیره ارجاع بیان شده در 13 یافته مقاله‌های مربوط تمام نسخه‌های 2 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The central role of the loss function in reinforcement learning‏

K Wang, N Kallus, W Sun - arxiv preprint arxiv:2409.12799, 2024‏ - arxiv.org‏

This paper illustrates the central role of loss functions in data-driven decision making,
providing a comprehensive survey on their influence in cost-sensitive classification (CSC) …‏

ذخیره ارجاع بیان شده در 5 یافته مقاله‌های مربوط تمام نسخه‌های 2 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Policy learning under biased sample selection‏

L Lei, R Sahoo, S Wager - arxiv preprint arxiv:2304.11735, 2023‏ - arxiv.org‏

Practitioners often use data from a randomized controlled trial to learn a treatment
assignment policy that can be deployed on a target population. A recurring concern in doing …‏

ذخیره ارجاع بیان شده در 18 یافته مقاله‌های مربوط تمام نسخه‌های 5 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Policy learning for localized interventions from observational data‏

MG Marmarelis, F Morstatter… - International …, 2024‏ - proceedings.mlr.press‏

A largely unaddressed problem in causal inference is that of learning reliable policies in
continuous, high-dimensional treatment variables from observational data. Especially in the …‏

ذخیره ارجاع بیان شده در 2 یافته مقاله‌های مربوط نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Uncertainty-aware instance reweighting for off-policy learning‏

X Zhang, J Chen, H Wang, H **e… - Advances in Neural …, 2023‏ - proceedings.neurips.cc‏

Off-policy learning, referring to the procedure of policy optimization with access only to
logged feedback data, has shown importance in various important real-world applications …‏

ذخیره ارجاع بیان شده در 7 یافته مقاله‌های مربوط تمام نسخه‌های 7 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Online policy optimization for robust mdp‏

J Dong, J Li, B Wang, J Zhang - arxiv preprint arxiv:2209.13841, 2022‏ - arxiv.org‏

Reinforcement learning (RL) has exceeded human performance in many synthetic settings
such as video games and Go. However, real-world deployment of end-to-end RL models is …‏

ذخیره ارجاع بیان شده در 18 یافته مقاله‌های مربوط تمام نسخه‌های 7 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Distributional shift-aware off-policy interval estimation: A unified error quantification framework‏

W Zhou, Y Li, R Zhu, A Qu - arxiv preprint arxiv:2309.13278, 2023‏ - arxiv.org‏

We study high-confidence off-policy evaluation in the context of infinite-horizon Markov
decision processes, where the objective is to establish a confidence interval (CI) for the …‏

ذخیره ارجاع بیان شده در 3 یافته مقاله‌های مربوط تمام نسخه‌های 3 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Provable risk-sensitive distributional reinforcement learning with general function approximation‏

Y Chen, X Zhang, S Wang, L Huang - arxiv preprint arxiv:2402.18159, 2024‏ - arxiv.org‏

In the realm of reinforcement learning (RL), accounting for risk is crucial for making
decisions under uncertainty, particularly in applications where safety and reliability are …‏

ذخیره ارجاع بیان شده در 5 یافته مقاله‌های مربوط تمام نسخه‌های 6 نسخه HTML

ایجاد هشدار

ارجاع

جستجوی پیشرفته

در «کتابخانه من» ذخیره شد

Doubly robust distributionally robust off-policy evaluation and learning

Near-minimax-optimal risk-sensitive reinforcement learning with cvar‏

The benefits of being distributional: Small-loss bounds for reinforcement learning‏

Distributionally robust policy gradient for offline contextual bandits‏

The central role of the loss function in reinforcement learning‏

Policy learning under biased sample selection‏

Policy learning for localized interventions from observational data‏

Uncertainty-aware instance reweighting for off-policy learning‏

Online policy optimization for robust mdp‏

Distributional shift-aware off-policy interval estimation: A unified error quantification framework‏

Provable risk-sensitive distributional reinforcement learning with general function approximation‏