- Academic Search

B Wallace, M Dang, R Rafailov… - Proceedings of the …, 2024 - openaccess.thecvf.com

Large language models (LLMs) are fine-tuned using human comparison data with
Reinforcement Learning from Human Feedback (RLHF) methods to make them better …

Save Cite Cited by 128 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Inverse preference learning: Preference-based rl without a reward function

J Hejna, D Sadigh - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Reward functions are difficult to design and often hard to align with human intent. Preference-
based Reinforcement Learning (RL) algorithms address these problems by learning reward …

Save Cite Cited by 46 Related articles All 9 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Ceil: Generalized contextual imitation learning

J Liu, L He, Y Kang, Z Zhuang… - Advances in Neural …, 2023 - proceedings.neurips.cc

In this paper, we present ContExtual Imitation Learning (CEIL), a general and broadly
applicable algorithm for imitation learning (IL). Inspired by the formulation of hindsight …

Save Cite Cited by 18 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Discriminator-weighted offline imitation learning from suboptimal demonstrations

H Xu, X Zhan, H Yin, H Qin - International Conference on …, 2022 - proceedings.mlr.press

We study the problem of offline Imitation Learning (IL) where an agent aims to learn an
optimal expert behavior policy without additional online environment interactions. Instead …

Save Cite Cited by 80 Related articles All 10 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Learning agile skills via adversarial imitation of rough partial demonstrations

C Li, M Vlastelica, S Blaes, J Frey… - … on Robot Learning, 2023 - proceedings.mlr.press

Learning agile skills is one of the main challenges in robotics. To this end, reinforcement
learning approaches have achieved impressive results. These methods require explicit task …

Save Cite Cited by 57 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Extreme q-learning: Maxent rl without entropy

D Garg, J Hejna, M Geist, S Ermon - arxiv preprint arxiv:2301.02328, 2023 - arxiv.org

Modern Deep Reinforcement Learning (RL) algorithms require estimates of the maximal Q-
value, which are difficult to compute in continuous domains with an infinite number of …

Save Cite Cited by 72 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Benchmarks and algorithms for offline preference-based reward learning

D Shin, AD Dragan, DS Brown - arxiv preprint arxiv:2301.01392, 2023 - arxiv.org

Learning a reward function from human preferences is challenging as it typically requires
having a high-fidelity simulator or using expensive and potentially unsafe actual physical …

Save Cite Cited by 60 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] thecvf.com

Skilldiffuser: Interpretable hierarchical planning via skill abstractions in diffusion-based task execution

Z Liang, Y Mu, H Ma, M Tomizuka… - Proceedings of the …, 2024 - openaccess.thecvf.com

Diffusion models have demonstrated strong potential for robotic trajectory planning.
However generating coherent trajectories from high-level instructions remains challenging …

Save Cite Cited by 27 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mdpi.com

Inverse reinforcement learning as the algorithmic basis for theory of mind: current methods and open problems

J Ruiz-Serra, MS Harré - Algorithms, 2023 - mdpi.com

Theory of mind (ToM) is the psychological construct by which we model another's internal
mental states. Through ToM, we adjust our own behaviour to best suit a social context, and …

Save Cite Cited by 16 Related articles All 4 versions Free GPT-4 Cached

[Free GPT-4]

[PDF] neurips.cc

Maximum-likelihood inverse reinforcement learning with finite-time guarantees

S Zeng, C Li, A Garcia, M Hong - Advances in Neural …, 2022 - proceedings.neurips.cc

Inverse reinforcement learning (IRL) aims to recover the reward function and the associated
optimal policy that best fits observed sequences of states and actions implemented by an …

Save Cite Cited by 35 Related articles All 10 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Iq-learn: Inverse soft-q learning for imitation

Diffusion model alignment using direct preference optimization

Inverse preference learning: Preference-based rl without a reward function

Ceil: Generalized contextual imitation learning

Discriminator-weighted offline imitation learning from suboptimal demonstrations

Learning agile skills via adversarial imitation of rough partial demonstrations

Extreme q-learning: Maxent rl without entropy

Benchmarks and algorithms for offline preference-based reward learning

Skilldiffuser: Interpretable hierarchical planning via skill abstractions in diffusion-based task execution

Inverse reinforcement learning as the algorithmic basis for theory of mind: current methods and open problems

Maximum-likelihood inverse reinforcement learning with finite-time guarantees