Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
A review of uncertainty for deep reinforcement learning
Uncertainty is ubiquitous in games, both in the agents playing games and often in the games
themselves. Working with uncertainty is therefore an important component of successful …
themselves. Working with uncertainty is therefore an important component of successful …
Reinforcement learning applied to wastewater treatment process control optimization: Approaches, challenges, and path forward
Wastewater treatment process control optimization is a complex task in a highly nonlinear
environment. Reinforcement learning (RL) is a machine learning technique that stands out …
environment. Reinforcement learning (RL) is a machine learning technique that stands out …
Mildly conservative q-learning for offline reinforcement learning
Offline reinforcement learning (RL) defines the task of learning from a static logged dataset
without continually interacting with the environment. The distribution shift between the …
without continually interacting with the environment. The distribution shift between the …
Rorl: Robust offline reinforcement learning via conservative smoothing
Offline reinforcement learning (RL) provides a promising direction to exploit massive amount
of offline data for complex decision-making tasks. Due to the distribution shift issue, current …
of offline data for complex decision-making tasks. Due to the distribution shift issue, current …
A policy-guided imitation approach for offline reinforcement learning
Offline reinforcement learning (RL) methods can generally be categorized into two types: RL-
based and Imitation-based. RL-based methods could in principle enjoy out-of-distribution …
based and Imitation-based. RL-based methods could in principle enjoy out-of-distribution …
Model-Bellman inconsistency for model-based offline reinforcement learning
For offline reinforcement learning (RL), model-based methods are expected to be data-
efficient as they incorporate dynamics models to generate more data. However, due to …
efficient as they incorporate dynamics models to generate more data. However, due to …
Reinforcement learning with human feedback: Learning dynamic choices via pessimism
In this paper, we study offline Reinforcement Learning with Human Feedback (RLHF) where
we aim to learn the human's underlying reward and the MDP's optimal policy from a set of …
we aim to learn the human's underlying reward and the MDP's optimal policy from a set of …
Offline multi-agent reinforcement learning with implicit global-to-local value regularization
Offline reinforcement learning (RL) has received considerable attention in recent years due
to its attractive capability of learning policies from offline datasets without environmental …
to its attractive capability of learning policies from offline datasets without environmental …
What is essential for unseen goal generalization of offline goal-conditioned rl?
Offline goal-conditioned RL (GCRL) offers a way to train general-purpose agents from fully
offline datasets. In addition to being conservative within the dataset, the generalization …
offline datasets. In addition to being conservative within the dataset, the generalization …
Corruption-robust offline reinforcement learning with general function approximation
We investigate the problem of corruption robustness in offline reinforcement learning (RL)
with general function approximation, where an adversary can corrupt each sample in the …
with general function approximation, where an adversary can corrupt each sample in the …