Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
A survey on causal reinforcement learning
While reinforcement learning (RL) achieves tremendous success in sequential decision-
making problems of many domains, it still faces key challenges of data inefficiency and the …
making problems of many domains, it still faces key challenges of data inefficiency and the …
On the opportunities and challenges of offline reinforcement learning for recommender systems
Reinforcement learning serves as a potent tool for modeling dynamic user interests within
recommender systems, garnering increasing research attention of late. However, a …
recommender systems, garnering increasing research attention of late. However, a …
Provably mitigating overoptimization in rlhf: Your sft loss is implicitly an adversarial regularizer
Aligning generative models with human preference via RLHF typically suffers from
overoptimization, where an imperfectly learned reward model can misguide the generative …
overoptimization, where an imperfectly learned reward model can misguide the generative …
Structure in deep reinforcement learning: A survey and open problems
Reinforcement Learning (RL), bolstered by the expressive capabilities of Deep Neural
Networks (DNNs) for function approximation, has demonstrated considerable success in …
Networks (DNNs) for function approximation, has demonstrated considerable success in …
Provably efficient causal reinforcement learning with confounded observational data
Empowered by neural networks, deep reinforcement learning (DRL) achieves tremendous
empirical success. However, DRL requires a large dataset by interacting with the …
empirical success. However, DRL requires a large dataset by interacting with the …
A minimax learning approach to off-policy evaluation in confounded partially observable markov decision processes
We consider off-policy evaluation (OPE) in Partially Observable Markov Decision Processes
(POMDPs), where the evaluation policy depends only on observable variables and the …
(POMDPs), where the evaluation policy depends only on observable variables and the …
Proximal reinforcement learning: Efficient off-policy evaluation in partially observed markov decision processes
In applications of offline reinforcement learning to observational data, such as in healthcare
or education, a general concern is that observed actions might be affected by unobserved …
or education, a general concern is that observed actions might be affected by unobserved …
An instrumental variable approach to confounded off-policy evaluation
Off-policy evaluation (OPE) aims to estimate the return of a target policy using some pre-
collected observational data generated by a potentially different behavior policy. In many …
collected observational data generated by a potentially different behavior policy. In many …
Causal reinforcement learning: A survey
Reinforcement learning is an essential paradigm for solving sequential decision problems
under uncertainty. Despite many remarkable achievements in recent decades, applying …
under uncertainty. Despite many remarkable achievements in recent decades, applying …
Minimax Instrumental Variable Regression and Convergence Guarantees without Identification or Closedness
In this paper, we study nonparametric estimation of instrumental variable (IV) regressions.
Recently, many flexible machine learning methods have been developed for instrumental …
Recently, many flexible machine learning methods have been developed for instrumental …