Regularizing hidden states enables learning generalizable reward model for llms

R Yang, R Ding, Y Lin, H Zhang… - Advances in Neural …, 2025 - proceedings.neurips.cc
Reward models trained on human preference data have been proven to effectively align
Large Language Models (LLMs) with human intent within the framework of reinforcement …

Learning robotic navigation from experience: principles, methods and recent results

S Levine, D Shah - … Transactions of the Royal Society B, 2023 - royalsocietypublishing.org
Navigation is one of the most heavily studied problems in robotics and is conventionally
approached as a geometric map** and planning problem. However, real-world navigation …

Goal-conditioned imitation learning using score-based diffusion policies

M Reuss, M Li, X Jia, R Lioutikov - arxiv preprint arxiv:2304.02532, 2023 - arxiv.org
We propose a new policy representation based on score-based diffusion models (SDMs).
We apply our new policy representation in the domain of Goal-Conditioned Imitation …

Hiql: Offline goal-conditioned rl with latent states as actions

S Park, D Ghosh, B Eysenbach… - Advances in Neural …, 2023 - proceedings.neurips.cc
Unsupervised pre-training has recently become the bedrock for computer vision and natural
language processing. In reinforcement learning (RL), goal-conditioned RL can potentially …

Rorl: Robust offline reinforcement learning via conservative smoothing

R Yang, C Bai, X Ma, Z Wang… - Advances in neural …, 2022 - proceedings.neurips.cc
Offline reinforcement learning (RL) provides a promising direction to exploit massive amount
of offline data for complex decision-making tasks. Due to the distribution shift issue, current …

Inference via interpolation: Contrastive representations provably enable planning and inference

B Eysenbach, V Myers… - Advances in Neural …, 2025 - proceedings.neurips.cc
Given time series data, how can we answer questions like what will happen in the
future?''and how did we get here?''These sorts of probabilistic inference questions are …

A policy-guided imitation approach for offline reinforcement learning

H Xu, L Jiang, L Jianxiong… - Advances in neural …, 2022 - proceedings.neurips.cc
Offline reinforcement learning (RL) methods can generally be categorized into two types: RL-
based and Imitation-based. RL-based methods could in principle enjoy out-of-distribution …

From play to policy: Conditional behavior generation from uncurated robot data

ZJ Cui, Y Wang, NMM Shafiullah, L Pinto - arxiv preprint arxiv:2210.10047, 2022 - arxiv.org
While large-scale sequence modeling from offline data has led to impressive performance
gains in natural language and image generation, directly translating such ideas to robotics …

Hierarchical diffusion for offline decision making

W Li, X Wang, B **, H Zha - International Conference on …, 2023 - proceedings.mlr.press
Offline reinforcement learning typically introduces a hierarchical structure to solve the long-
horizon problem so as to address its thorny issue of variance accumulation. Problems of …

Rewards-in-context: Multi-objective alignment of foundation models with dynamic preference adjustment

R Yang, X Pan, F Luo, S Qiu, H Zhong, D Yu… - arxiv preprint arxiv …, 2024 - arxiv.org
We consider the problem of multi-objective alignment of foundation models with human
preferences, which is a critical step towards helpful and harmless AI systems. However, it is …