[PDF][PDF] A survey of reinforcement learning from human feedback

T Kaufmann, P Weng, V Bengs… - arxiv preprint arxiv …, 2023 - researchgate.net
Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning
(RL) that learns from human feedback instead of relying on an engineered reward function …

Serl: A software suite for sample-efficient robotic reinforcement learning

J Luo, Z Hu, C Xu, YL Tan, J Berg… - … on Robotics and …, 2024 - ieeexplore.ieee.org
In recent years, significant progress has been made in the field of robotic reinforcement
learning (RL), enabling methods that handle complex image observations, train in the real …

Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning

J Luo, C Xu, J Wu, S Levine - arxiv preprint arxiv:2410.21845, 2024 - arxiv.org
Reinforcement learning (RL) holds great promise for enabling autonomous acquisition of
complex robotic manipulation skills, but realizing this potential in real-world settings has …

Vanp: Learning where to see for navigation with self-supervised vision-action pre-training

M Nazeri, J Wang, A Payandeh… - 2024 IEEE/RSJ …, 2024 - ieeexplore.ieee.org
Humans excel at efficiently navigating through crowds without collision by focusing on
specific visual regions relevant to navigation. However, most robotic visual navigation …

Deep imitative reinforcement learning with gradient conflict-free for decision-making in autonomous vehicles

Z Shan, J Zhao, W Huang, Y Zhao, L Ge… - … Research Part C …, 2025 - Elsevier
As autonomous driving technology advances, researchers are focusing on utilizing expert
priors to improve the agents for learning-based decision-making in autonomous vehicles …

Reinforcement Learning From Imperfect Corrective Actions And Proxy Rewards

Z Jiang, X Feng, P Weng, Y Zhu, Y Song… - arxiv preprint arxiv …, 2024 - arxiv.org
In practice, reinforcement learning (RL) agents are often trained with a possibly imperfect
proxy reward function, which may lead to a human-agent alignment issue (ie, the learned …

MILE: Model-based Intervention Learning

Y Korkmaz, E Bıyık - arxiv preprint arxiv:2502.13519, 2025 - arxiv.org
Imitation learning techniques have been shown to be highly effective in real-world control
scenarios, such as robotics. However, these approaches not only suffer from compounding …

ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy

Y Chen, S Tian, S Liu, Y Zhou, H Li, D Zhao - arxiv preprint arxiv …, 2025 - arxiv.org
Vision-Language-Action (VLA) models have shown substantial potential in real-world
robotic manipulation. However, fine-tuning these models through supervised learning …

Disentangling syntactics, semantics, and pragmatics in natural language processing

X Zhang - 2024 - dr.ntu.edu.sg
In the era of deep learning, the natural language processing (NLP) community has become
increasingly reliant on large language models (LLM), which are essentially probabilistic …

Decentralized Cooperative Multi-Agent Deep Reinforcement Learning for Real-Time Optimization of Emulated Ad-Hoc Radio Networks

T Möhlenhof, N Jansen - MILCOM 2024-2024 IEEE Military …, 2024 - ieeexplore.ieee.org
This research explores the application of policy gradient methods in multi-agent
reinforcement learning, augmented with offline reinforcement Learning techniques. The goal …