Reinforcement learning: An overview

K Murphy - arxiv preprint arxiv:2412.05265, 2024 - arxiv.org
This manuscript gives a big-picture, up-to-date overview of the field of (deep) reinforcement
learning and sequential decision making, covering value-based RL, policy-gradient …

Position: video as the new language for real-world decision making

S Yang, JC Walker, J Parker-Holder, Y Du… - … on Machine Learning, 2024 - openreview.net
Both text and video data are abundant on the internet and support large-scale self-
supervised learning through next token or frame prediction. However, they have not been …

Integrating reinforcement learning with foundation models for autonomous robotics: Methods and perspectives

A Moroncelli, V Soni, AA Shahid, M Maccarini… - arxiv preprint arxiv …, 2024 - arxiv.org
Foundation models (FMs), large deep learning models pre-trained on vast, unlabeled
datasets, exhibit powerful capabilities in understanding complex patterns and generating …

Improving dynamic object interactions in text-to-video generation with ai feedback

H Furuta, H Zen, D Schuurmans, A Faust… - arxiv preprint arxiv …, 2024 - arxiv.org
Large text-to-video models hold immense potential for a wide range of downstream
applications. However, these models struggle to accurately depict dynamic object …

Vision Language Models are In-Context Value Learners

YJ Ma, J Hejna, C Fu, D Shah, J Liang, Z Xu… - The Thirteenth …, 2024 - openreview.net
Predicting temporal progress from visual trajectories is important for intelligent robots that
can learn, adapt, and improve. However, learning such progress estimator, or temporal …

Automated Rewards via LLM-Generated Progress Functions

V Sarukkai, B Shacklett, Z Majercik, K Bhatia… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) have the potential to automate reward engineering by
leveraging their broad domain knowledge across various tasks. However, they often need …

A Real-to-Sim-to-Real Approach to Robotic Manipulation with VLM-Generated Iterative Keypoint Rewards

S Patel, X Yin, W Huang, S Garg, H Nayyeri… - arxiv preprint arxiv …, 2025 - arxiv.org
Task specification for robotic manipulation in open-world environments is challenging,
requiring flexible and adaptive objectives that align with human intentions and can evolve …

VLM-RL: A Unified Vision Language Models and Reinforcement Learning Framework for Safe Autonomous Driving

Z Huang, Z Sheng, Y Qu, J You, S Chen - arxiv preprint arxiv:2412.15544, 2024 - arxiv.org
In recent years, reinforcement learning (RL)-based methods for learning driving policies
have gained increasing attention in the autonomous driving community and have achieved …

Dreaming to Assist: Learning to Align with Human Objectives for Shared Control in High-Speed Racing

J DeCastro, A Silva, D Gopinath, E Sumner… - arxiv preprint arxiv …, 2024 - arxiv.org
Tight coordination is required for effective human-robot teams in domains involving fast
dynamics and tactical decisions, such as multi-car racing. In such settings, robot teammates …

Online Preference-based Reinforcement Learning with Self-augmented Feedback from Large Language Model

S Tu, J Sun, Q Zhang, X Lan, D Zhao - arxiv preprint arxiv:2412.16878, 2024 - arxiv.org
Preference-based reinforcement learning (PbRL) provides a powerful paradigm to avoid
meticulous reward engineering by learning rewards based on human preferences …