Vision-language models are zero-shot reward models for reinforcement learning

J Rocamonde, V Montesinos, E Nava, E Perez… - arxiv preprint arxiv …, 2023 - arxiv.org
Reinforcement learning (RL) requires either manually specifying a reward function, which is
often infeasible, or learning a reward model from a large amount of human feedback, which …

Ram: Retrieval-based affordance transfer for generalizable zero-shot robotic manipulation

Y Kuang, J Ye, H Geng, J Mao, C Deng… - arxiv preprint arxiv …, 2024 - arxiv.org
This work proposes a retrieve-and-transfer framework for zero-shot robotic manipulation,
dubbed RAM, featuring generalizability across various objects, environments, and …

Active preference-based Gaussian process regression for reward learning and optimization

E Bıyık, N Huynh, MJ Kochenderfer… - … Journal of Robotics …, 2024 - journals.sagepub.com
Designing reward functions is a difficult task in AI and robotics. The complex task of directly
specifying all the desirable behaviors a robot needs to optimize often proves challenging for …

Vision-language models as a source of rewards

K Baumli, S Baveja, F Behbahani, H Chan… - arxiv preprint arxiv …, 2023 - arxiv.org
Building generalist agents that can accomplish many goals in rich open-ended
environments is one of the research frontiers for reinforcement learning. A key limiting factor …

Rl-vlm-f: Reinforcement learning from vision language foundation model feedback

Y Wang, Z Sun, J Zhang, Z **an, E Biyik, D Held… - arxiv preprint arxiv …, 2024 - arxiv.org
Reward engineering has long been a challenge in Reinforcement Learning (RL) research,
as it often requires extensive human effort and iterative processes of trial-and-error to design …

Integrating reinforcement learning with foundation models for autonomous robotics: Methods and perspectives

A Moroncelli, V Soni, AA Shahid, M Maccarini… - arxiv preprint arxiv …, 2024 - arxiv.org
Foundation models (FMs), large deep learning models pre-trained on vast, unlabeled
datasets, exhibit powerful capabilities in understanding complex patterns and generating …

LLM-empowered state representation for reinforcement learning

B Wang, Y Qu, Y Jiang, J Shao, C Liu, W Yang… - arxiv preprint arxiv …, 2024 - arxiv.org
Conventional state representations in reinforcement learning often omit critical task-related
details, presenting a significant challenge for value networks in establishing accurate …

Vision-language model-based human-robot collaboration for smart manufacturing: A state-of-the-art survey

J Fan, Y Yin, T Wang, W Dong, P Zheng… - Frontiers of Engineering …, 2025 - Springer
Abstract human-robot collaboration (HRC) is set to transform the manufacturing paradigm by
leveraging the strengths of human flexibility and robot precision. The recent breakthrough of …

Evolution and Prospects of Foundation Models: From Large Language Models to Large Multimodal Models.

Z Chen, L Xu, H Zheng, L Chen… - Computers …, 2024 - search.ebscohost.com
Since the 1950s, when the Turing Test was introduced, there has been notable progress in
machine language intelligence. Language modeling, crucial for AI development, has …

Epo: Hierarchical llm agents with environment preference optimization

Q Zhao, H Fu, C Sun, G Konidaris - arxiv preprint arxiv:2408.16090, 2024 - arxiv.org
Long-horizon decision-making tasks present significant challenges for LLM-based agents
due to the need for extensive planning over multiple steps. In this paper, we propose a …