- Academic Search

K Murphy - arxiv preprint arxiv:2412.05265, 2024 - arxiv.org

This manuscript gives a big-picture, up-to-date overview of the field of (deep) reinforcement
learning and sequential decision making, covering value-based RL, policy-gradient …

Lưu Trích dẫn Trích dẫn 1 bài viết Bài viết có liên quan Tất cả 2 phiên bản Xem dạng HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Position: video as the new language for real-world decision making

S Yang, JC Walker, J Parker-Holder, Y Du… - … on Machine Learning, 2024 - openreview.net

Both text and video data are abundant on the internet and support large-scale self-
supervised learning through next token or frame prediction. However, they have not been …

Lưu Trích dẫn Trích dẫn 7 bài viết Bài viết có liên quan Tất cả 5 phiên bản Xem dạng HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Integrating reinforcement learning with foundation models for autonomous robotics: Methods and perspectives

A Moroncelli, V Soni, AA Shahid, M Maccarini… - arxiv preprint arxiv …, 2024 - arxiv.org

Foundation models (FMs), large deep learning models pre-trained on vast, unlabeled
datasets, exhibit powerful capabilities in understanding complex patterns and generating …

Lưu Trích dẫn Trích dẫn 1 bài viết Bài viết có liên quan Tất cả 2 phiên bản Xem dạng HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Improving dynamic object interactions in text-to-video generation with ai feedback

H Furuta, H Zen, D Schuurmans, A Faust… - arxiv preprint arxiv …, 2024 - arxiv.org

Large text-to-video models hold immense potential for a wide range of downstream
applications. However, these models struggle to accurately depict dynamic object …

Lưu Trích dẫn Trích dẫn 2 bài viết Bài viết có liên quan Tất cả 2 phiên bản Xem dạng HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Vision Language Models are In-Context Value Learners

YJ Ma, J Hejna, C Fu, D Shah, J Liang, Z Xu… - The Thirteenth …, 2024 - openreview.net

Predicting temporal progress from visual trajectories is important for intelligent robots that
can learn, adapt, and improve. However, learning such progress estimator, or temporal …

Lưu Trích dẫn Bài viết có liên quan Tất cả 3 phiên bản Xem dạng HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Automated Rewards via LLM-Generated Progress Functions

V Sarukkai, B Shacklett, Z Majercik, K Bhatia… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Language Models (LLMs) have the potential to automate reward engineering by
leveraging their broad domain knowledge across various tasks. However, they often need …

Lưu Trích dẫn Trích dẫn 1 bài viết Bài viết có liên quan Tất cả 2 phiên bản Xem dạng HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A Real-to-Sim-to-Real Approach to Robotic Manipulation with VLM-Generated Iterative Keypoint Rewards

S Patel, X Yin, W Huang, S Garg, H Nayyeri… - arxiv preprint arxiv …, 2025 - arxiv.org

Task specification for robotic manipulation in open-world environments is challenging,
requiring flexible and adaptive objectives that align with human intentions and can evolve …

Lưu Trích dẫn Bài viết có liên quan Tất cả 2 phiên bản Xem dạng HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

VLM-RL: A Unified Vision Language Models and Reinforcement Learning Framework for Safe Autonomous Driving

Z Huang, Z Sheng, Y Qu, J You, S Chen - arxiv preprint arxiv:2412.15544, 2024 - arxiv.org

In recent years, reinforcement learning (RL)-based methods for learning driving policies
have gained increasing attention in the autonomous driving community and have achieved …

Lưu Trích dẫn Trích dẫn 1 bài viết Bài viết có liên quan Tất cả 2 phiên bản Xem dạng HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Dreaming to Assist: Learning to Align with Human Objectives for Shared Control in High-Speed Racing

J DeCastro, A Silva, D Gopinath, E Sumner… - arxiv preprint arxiv …, 2024 - arxiv.org

Tight coordination is required for effective human-robot teams in domains involving fast
dynamics and tactical decisions, such as multi-car racing. In such settings, robot teammates …

Lưu Trích dẫn Bài viết có liên quan Tất cả 3 phiên bản Xem dạng HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Online Preference-based Reinforcement Learning with Self-augmented Feedback from Large Language Model

S Tu, J Sun, Q Zhang, X Lan, D Zhao - arxiv preprint arxiv:2412.16878, 2024 - arxiv.org

Preference-based reinforcement learning (PbRL) provides a powerful paradigm to avoid
meticulous reward engineering by learning rewards based on human preferences …

Lưu Trích dẫn Bài viết có liên quan Tất cả 2 phiên bản Xem dạng HTML

Tạo thông báo

Trích dẫn

Tìm kiếm nâng cao

Đã lưu vào Thư viện của tôi

Code as reward: Empowering reinforcement learning with vlms

Reinforcement learning: An overview

Position: video as the new language for real-world decision making

Integrating reinforcement learning with foundation models for autonomous robotics: Methods and perspectives

Improving dynamic object interactions in text-to-video generation with ai feedback

Vision Language Models are In-Context Value Learners

Automated Rewards via LLM-Generated Progress Functions

A Real-to-Sim-to-Real Approach to Robotic Manipulation with VLM-Generated Iterative Keypoint Rewards

VLM-RL: A Unified Vision Language Models and Reinforcement Learning Framework for Safe Autonomous Driving

Dreaming to Assist: Learning to Align with Human Objectives for Shared Control in High-Speed Racing

Online Preference-based Reinforcement Learning with Self-augmented Feedback from Large Language Model