Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Regularizing hidden states enables learning generalizable reward model for llms
Reward models trained on human preference data have been proven to effectively align
Large Language Models (LLMs) with human intent within the framework of reinforcement …
Large Language Models (LLMs) with human intent within the framework of reinforcement …
Learning robotic navigation from experience: principles, methods and recent results
Navigation is one of the most heavily studied problems in robotics and is conventionally
approached as a geometric map** and planning problem. However, real-world navigation …
approached as a geometric map** and planning problem. However, real-world navigation …
Goal-conditioned imitation learning using score-based diffusion policies
We propose a new policy representation based on score-based diffusion models (SDMs).
We apply our new policy representation in the domain of Goal-Conditioned Imitation …
We apply our new policy representation in the domain of Goal-Conditioned Imitation …
Hiql: Offline goal-conditioned rl with latent states as actions
Unsupervised pre-training has recently become the bedrock for computer vision and natural
language processing. In reinforcement learning (RL), goal-conditioned RL can potentially …
language processing. In reinforcement learning (RL), goal-conditioned RL can potentially …
Rorl: Robust offline reinforcement learning via conservative smoothing
Offline reinforcement learning (RL) provides a promising direction to exploit massive amount
of offline data for complex decision-making tasks. Due to the distribution shift issue, current …
of offline data for complex decision-making tasks. Due to the distribution shift issue, current …
Inference via interpolation: Contrastive representations provably enable planning and inference
Given time series data, how can we answer questions like what will happen in the
future?''and how did we get here?''These sorts of probabilistic inference questions are …
future?''and how did we get here?''These sorts of probabilistic inference questions are …
A policy-guided imitation approach for offline reinforcement learning
Offline reinforcement learning (RL) methods can generally be categorized into two types: RL-
based and Imitation-based. RL-based methods could in principle enjoy out-of-distribution …
based and Imitation-based. RL-based methods could in principle enjoy out-of-distribution …
From play to policy: Conditional behavior generation from uncurated robot data
While large-scale sequence modeling from offline data has led to impressive performance
gains in natural language and image generation, directly translating such ideas to robotics …
gains in natural language and image generation, directly translating such ideas to robotics …
Hierarchical diffusion for offline decision making
Offline reinforcement learning typically introduces a hierarchical structure to solve the long-
horizon problem so as to address its thorny issue of variance accumulation. Problems of …
horizon problem so as to address its thorny issue of variance accumulation. Problems of …
Rewards-in-context: Multi-objective alignment of foundation models with dynamic preference adjustment
We consider the problem of multi-objective alignment of foundation models with human
preferences, which is a critical step towards helpful and harmless AI systems. However, it is …
preferences, which is a critical step towards helpful and harmless AI systems. However, it is …