Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Reinforcement learning: An overview
K Murphy - arxiv preprint arxiv:2412.05265, 2024 - arxiv.org
This manuscript gives a big-picture, up-to-date overview of the field of (deep) reinforcement
learning and sequential decision making, covering value-based RL, policy-gradient …
learning and sequential decision making, covering value-based RL, policy-gradient …
Position: video as the new language for real-world decision making
Both text and video data are abundant on the internet and support large-scale self-
supervised learning through next token or frame prediction. However, they have not been …
supervised learning through next token or frame prediction. However, they have not been …
Integrating reinforcement learning with foundation models for autonomous robotics: Methods and perspectives
Foundation models (FMs), large deep learning models pre-trained on vast, unlabeled
datasets, exhibit powerful capabilities in understanding complex patterns and generating …
datasets, exhibit powerful capabilities in understanding complex patterns and generating …
Improving dynamic object interactions in text-to-video generation with ai feedback
Large text-to-video models hold immense potential for a wide range of downstream
applications. However, these models struggle to accurately depict dynamic object …
applications. However, these models struggle to accurately depict dynamic object …
Vision Language Models are In-Context Value Learners
Predicting temporal progress from visual trajectories is important for intelligent robots that
can learn, adapt, and improve. However, learning such progress estimator, or temporal …
can learn, adapt, and improve. However, learning such progress estimator, or temporal …
Automated Rewards via LLM-Generated Progress Functions
Large Language Models (LLMs) have the potential to automate reward engineering by
leveraging their broad domain knowledge across various tasks. However, they often need …
leveraging their broad domain knowledge across various tasks. However, they often need …
A Real-to-Sim-to-Real Approach to Robotic Manipulation with VLM-Generated Iterative Keypoint Rewards
Task specification for robotic manipulation in open-world environments is challenging,
requiring flexible and adaptive objectives that align with human intentions and can evolve …
requiring flexible and adaptive objectives that align with human intentions and can evolve …
VLM-RL: A Unified Vision Language Models and Reinforcement Learning Framework for Safe Autonomous Driving
In recent years, reinforcement learning (RL)-based methods for learning driving policies
have gained increasing attention in the autonomous driving community and have achieved …
have gained increasing attention in the autonomous driving community and have achieved …
Dreaming to Assist: Learning to Align with Human Objectives for Shared Control in High-Speed Racing
Tight coordination is required for effective human-robot teams in domains involving fast
dynamics and tactical decisions, such as multi-car racing. In such settings, robot teammates …
dynamics and tactical decisions, such as multi-car racing. In such settings, robot teammates …
Online Preference-based Reinforcement Learning with Self-augmented Feedback from Large Language Model
Preference-based reinforcement learning (PbRL) provides a powerful paradigm to avoid
meticulous reward engineering by learning rewards based on human preferences …
meticulous reward engineering by learning rewards based on human preferences …