Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Near-minimax-optimal risk-sensitive reinforcement learning with cvar
In this paper, we study risk-sensitive Reinforcement Learning (RL), focusing on the objective
of Conditional Value at Risk (CVaR) with risk tolerance $\tau $. Starting with multi-arm …
of Conditional Value at Risk (CVaR) with risk tolerance $\tau $. Starting with multi-arm …
The benefits of being distributional: Small-loss bounds for reinforcement learning
While distributional reinforcement learning (DistRL) has been empirically effective, the
question of when and why it is better than vanilla, non-distributional RL has remained …
question of when and why it is better than vanilla, non-distributional RL has remained …
Distributionally robust policy gradient for offline contextual bandits
Learning an optimal policy from offline data is notoriously challenging, which requires the
evaluation of the learning policy using data pre-collected from a static logging policy. We …
evaluation of the learning policy using data pre-collected from a static logging policy. We …
The central role of the loss function in reinforcement learning
This paper illustrates the central role of loss functions in data-driven decision making,
providing a comprehensive survey on their influence in cost-sensitive classification (CSC) …
providing a comprehensive survey on their influence in cost-sensitive classification (CSC) …
Policy learning under biased sample selection
Practitioners often use data from a randomized controlled trial to learn a treatment
assignment policy that can be deployed on a target population. A recurring concern in doing …
assignment policy that can be deployed on a target population. A recurring concern in doing …
Policy learning for localized interventions from observational data
A largely unaddressed problem in causal inference is that of learning reliable policies in
continuous, high-dimensional treatment variables from observational data. Especially in the …
continuous, high-dimensional treatment variables from observational data. Especially in the …
Uncertainty-aware instance reweighting for off-policy learning
Off-policy learning, referring to the procedure of policy optimization with access only to
logged feedback data, has shown importance in various important real-world applications …
logged feedback data, has shown importance in various important real-world applications …
Online policy optimization for robust mdp
Reinforcement learning (RL) has exceeded human performance in many synthetic settings
such as video games and Go. However, real-world deployment of end-to-end RL models is …
such as video games and Go. However, real-world deployment of end-to-end RL models is …
Distributional shift-aware off-policy interval estimation: A unified error quantification framework
We study high-confidence off-policy evaluation in the context of infinite-horizon Markov
decision processes, where the objective is to establish a confidence interval (CI) for the …
decision processes, where the objective is to establish a confidence interval (CI) for the …
Provable risk-sensitive distributional reinforcement learning with general function approximation
In the realm of reinforcement learning (RL), accounting for risk is crucial for making
decisions under uncertainty, particularly in applications where safety and reliability are …
decisions under uncertainty, particularly in applications where safety and reliability are …