Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment

C Wang, Z Zhao, Y Jiang, Z Chen, C Zhu… - arxiv preprint arxiv …, 2025 - arxiv.org
Recent advances in large language models (LLMs) have demonstrated significant progress
in performing complex tasks. While Reinforcement Learning from Human Feedback (RLHF) …

MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware Multimodal Preference Optimization

K Zhu, P **a, Y Li, H Zhu, S Wang, H Yao - arxiv preprint arxiv …, 2024 - arxiv.org
The advancement of Large Vision-Language Models (LVLMs) has propelled their
application in the medical field. However, Medical LVLMs (Med-LVLMs) encounter factuality …

DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot Control

J Wen, Y Zhu, J Li, Z Tang, C Shen, F Feng - arxiv preprint arxiv …, 2025 - arxiv.org
Enabling robots to perform diverse tasks across varied environments is a central challenge
in robot learning. While vision-language-action (VLA) models have shown promise for …