Segui
Wei Xiong
Wei Xiong
Email verificata su illinois.edu - Home page
Titolo
Citata da
Citata da
Anno
Raft: Reward ranked finetuning for generative foundation model alignment
H Dong, W Xiong, D Goyal, Z Yihan, C Winnie, R Pan, S Diao, J Zhang, ...
TMLR, Selected to be presented at ICLR2025, 2023
3262023
Iterative preference learning from human feedback: Bridging theory and practice for rlhf under kl-constraint
W Xiong, H Dong, C Ye, Z Wang, H Zhong, H Ji, N Jiang, T Zhang
ICML 2024, 2023
122*2023
Mitigating the Alignment Tax of RLHF
Y Lin, H Lin, W Xiong, S Diao, J Liu, J Zhang, R Pan, H Wang, W Hu, ...
EMNLP 2024, 2023
102*2023
RLHF Workflow: From Reward Modeling to Online RLHF
H Dong, W Xiong, B Pang, H Wang, H Zhao, Y Zhou, N Jiang, D Sahoo, ...
TMLR, 2024
76*2024
Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts
H Wang, W Xiong, T Xie, H Zhao, T Zhang
EMNLP 2024, 2024
732024
A posterior sampling framework for interactive decision making
H Zhong, W Xiong, S Zheng, L Wang, Z Wang, Z Yang, T Zhang
arXiv preprint arXiv:2211.01962 2 (3), 2022
61*2022
Lmflow: An extensible toolkit for finetuning and inference of large foundation models
S Diao, R Pan, H Dong, KS Shum, J Zhang, W Xiong, T Zhang
NAACL 2024, Best Demo Paper Award, 2023
522023
Nearly minimax optimal offline reinforcement learning with linear function approximation: Single-agent mdp and markov game
W Xiong, H Zhong, C Shi, C Shen, L Wang, T Zhang
ICLR 2023, 2022
492022
Pessimistic minimax value iteration: Provably efficient equilibrium learning from offline datasets
H Zhong, W Xiong, J Tan, L Wang, T Zhang, Z Wang, Z Yang
ICML 2022, 2022
492022
Decentralized multi-player multi-armed bandits with no collision information
C Shi, W Xiong, C Shen, J Yang
AISTATS 2020, 2020
442020
Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards
H Wang, Y Lin, W Xiong, R Yang, S Diao, S Qiu, H Zhao, T Zhang
ACL 2024, 2024
432024
Maximize to explore: One objective function fusing estimation, planning, and exploration
Z Liu, M Lu, W Xiong, H Zhong, H Hu, S Zhang, S Zheng, Z Yang, Z Wang
NeurIPS 2023, 2024
33*2024
DPO Meets PPO: Reinforced Token Optimization for RLHF
H Zhong, G Feng, W Xiong, L Zhao, D He, J Bian, L Wang
arXiv preprint arXiv:2404.18922, 2024
312024
Online Iterative Reinforcement Learning from Human Feedback with General Preference Model
C Ye, W Xiong, Y Zhang, N Jiang, T Zhang
NeurIPS 2024, 2024
30*2024
Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes
C Ye, W Xiong, Q Gu, T Zhang
ICML 2023, 2022
282022
A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games
W Xiong, H Zhong, C Shi, C Shen, T Zhang
ICML 2022, 2022
272022
Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization
C Shi, W Xiong, C Shen, J Yang
NeurIPS 2021, 2021
252021
Distributional reinforcement learning for multi-dimensional reward functions
P Zhang, X Chen, L Zhao, W Xiong, T Qin, TY Liu
NeurIPS 2021, 2021
242021
Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization
R Pi, T Han, W Xiong, J Zhang, R Liu, R Pan, T Zhang
ECCV 2024, 2024
212024
PMGT-VR: A decentralized proximal-gradient algorithmic framework with variance reduction
H Ye, W Xiong, T Zhang
arXiv preprint arXiv:2012.15010, 2020
172020
Il sistema al momento non può eseguire l'operazione. Riprova più tardi.
Articoli 1–20