Suivre
Wei Fu
Wei Fu
Adresse e-mail validée de mails.tsinghua.edu.cn - Page d'accueil
Titre
Citée par
Citée par
Année
Is dpo superior to ppo for llm alignment? a comprehensive study
S Xu, W Fu, J Gao, W Ye, W Liu, Z Mei, G Wang, C Yu, Y Wu
arXiv preprint arXiv:2404.10719, 2024
712024
Revisiting some common practices in cooperative multi-agent reinforcement learning
W Fu, C Yu, Z Xu, J Yang, Y Wu
arXiv preprint arXiv:2206.07505, 2022
422022
Continuously discovering novel strategies via reward-switching policy optimization
Z Zhou, W Fu, B Zhang, Y Wu
arXiv preprint arXiv:2204.02246, 2022
332022
Learning agile bipedal motions on a quadrupedal robot
Y Li, J Li, W Fu, Y Wu
2024 IEEE International Conference on Robotics and Automation (ICRA), 9735-9742, 2024
92024
Srl: Scaling distributed reinforcement learning to over ten thousand cores
Z Mei, W Fu, J Gao, G Wang, H Zhang, Y Wu
arXiv preprint arXiv:2306.16688, 2023
52023
Iteratively learn diverse strategies with state distance information
W Fu, W Du, J Li, S Chen, J Zhang, Y Wu
Advances in Neural Information Processing Systems 36, 2024
42024
ReaLHF: Optimized RLHF Training for Large Language Models through Parameter Reallocation
Z Mei, W Fu, K Li, G Wang, H Zhang, Y Wu
arXiv preprint arXiv:2406.14088, 2024
32024
On designing effective rl reward at training time for llm reasoning
J Gao, S Xu, W Ye, W Liu, C He, W Fu, Z Mei, G Wang, Y Wu
arXiv preprint arXiv:2410.15115, 2024
22024
Iteratively learning novel strategies with diversity measured in state distances
W Fu, W Du, J Li, S Chen, J Zhang, Y Wu
12023
Unlocking the Potential of MAPPO with Asynchronous Optimization
W Fu, C Yu, Y Li, Y Wu
Artificial Intelligence: First CAAI International Conference, CICAI 2021 …, 2021
2021
Le système ne peut pas réaliser cette opération maintenant. Veuillez réessayer plus tard.
Articles 1–10