A Survey of Sim-to-Real Methods in RL: Progress, Prospects and Challenges with Foundation Models

L Da, J Turnau, TP Kutralingam, A Velasquez… - arxiv preprint arxiv …, 2025 - arxiv.org
Deep Reinforcement Learning (RL) has been explored and verified to be effective in solving
decision-making tasks in various domains, such as robotics, transportation, recommender …

DPM: Dual Preferences-based Multi-Agent Reinforcement Learning

S Kang, Y Lee, M Kim, J Oh, S Chong, SY Yun - 2024 - openreview.net
Preference-based Reinforcement Learning (PbRL), which optimizes reward functions using
preference feedback, is a promising approach for environments where handcrafted reward …

VLP: Vision-Language Preference Learning for Embodied Manipulation

R Liu, C Bai, J Lyu, S Sun, Y Du, X Li - arxiv preprint arxiv:2502.11918, 2025 - arxiv.org
Reward engineering is one of the key challenges in Reinforcement Learning (RL).
Preference-based RL effectively addresses this issue by learning from human feedback …

RAG-Gym: Optimizing Reasoning and Search Agents with Process Supervision

G **ong, Q **, X Wang, Y Fang, H Liu, Y Yang… - arxiv preprint arxiv …, 2025 - arxiv.org
Retrieval-augmented generation (RAG) has shown great potential for knowledge-intensive
tasks, but its traditional architectures rely on static retrieval, limiting their effectiveness for …

Multilinguality in LLM-Designed Reward Functions for Restless Bandits: Effects on Task Performance and Fairness

A Parthasarathy, C Subramanian, G Senrayan… - arxiv preprint arxiv …, 2025 - arxiv.org
Restless Multi-Armed Bandits (RMABs) have been successfully applied to resource
allocation problems in a variety of settings, including public health. With the rapid …