Hiql: Offline goal-conditioned rl with latent states as actions

S Park, D Ghosh, B Eysenbach… - Advances in Neural …, 2023 - proceedings.neurips.cc
Unsupervised pre-training has recently become the bedrock for computer vision and natural
language processing. In reinforcement learning (RL), goal-conditioned RL can potentially …

Goal-conditioned reinforcement learning: Problems and solutions

M Liu, M Zhu, W Zhang - arxiv preprint arxiv:2201.08299, 2022 - arxiv.org
Goal-conditioned reinforcement learning (GCRL), related to a set of complex RL problems,
trains an agent to achieve different goals under particular scenarios. Compared to the …

Optimal goal-reaching reinforcement learning via quasimetric learning

T Wang, A Torralba, P Isola… - … Conference on Machine …, 2023 - proceedings.mlr.press
In goal-reaching reinforcement learning (RL), the optimal value function has a particular
geometry, called quasimetrics structure. This paper introduces Quasimetric Reinforcement …

Metra: Scalable unsupervised rl with metric-aware abstraction

S Park, O Rybkin, S Levine - arxiv preprint arxiv:2310.08887, 2023 - arxiv.org
Unsupervised pre-training strategies have proven to be highly effective in natural language
processing and computer vision. Likewise, unsupervised reinforcement learning (RL) holds …

Learning Temporal Distances: Contrastive Successor Features Can Provide a Metric Structure for Decision-Making

V Myers, C Zheng, A Dragan, S Levine… - arxiv preprint arxiv …, 2024 - arxiv.org
Temporal distances lie at the heart of many algorithms for planning, control, and
reinforcement learning that involve reaching goals, allowing one to estimate the transit time …

Preference-grounded token-level guidance for language model fine-tuning

S Yang, S Zhang, C **a, Y Feng… - Advances in Neural …, 2023 - proceedings.neurips.cc
Aligning language models (LMs) with preferences is an important problem in natural
language generation. A key challenge is that preferences are typically provided at the …

Humanmimic: Learning natural locomotion and transitions for humanoid robot via wasserstein adversarial imitation

A Tang, T Hiraoka, N Hiraoka, F Shi… - … on Robotics and …, 2024 - ieeexplore.ieee.org
Transferring human motion skills to humanoid robots remains a significant challenge. In this
study, we introduce a Wasserstein adversarial imitation learning system, allowing humanoid …

How Far I'll Go: Offline Goal-Conditioned Reinforcement Learning via -Advantage Regression

YJ Ma, J Yan, D Jayaraman, O Bastani - arxiv preprint arxiv:2206.03023, 2022 - arxiv.org
Offline goal-conditioned reinforcement learning (GCRL) promises general-purpose skill
learning in the form of reaching diverse goals from purely offline datasets. We propose …

Contrastive difference predictive coding

C Zheng, R Salakhutdinov, B Eysenbach - arxiv preprint arxiv:2310.20141, 2023 - arxiv.org
Predicting and reasoning about the future lie at the heart of many time-series questions. For
example, goal-conditioned reinforcement learning can be viewed as learning …

Fantastic rewards and how to tame them: A case study on reward learning for task-oriented dialogue systems

Y Feng, S Yang, S Zhang, J Zhang, C **ong… - arxiv preprint arxiv …, 2023 - arxiv.org
When learning task-oriented dialogue (ToD) agents, reinforcement learning (RL) techniques
can naturally be utilized to train dialogue strategies to achieve user-specific goals. Prior …