Wei Xiong

Citata da

	Tutte	Dal 2020
Citazioni	1267	1267
Indice H	19	19
i10-index	21	21

920

460

230

690

2021202220232024202519 38 222 907 80

Accesso pubblico

Visualizza tutto

9 articoli

0 articoli

Disponibili

Non disponibili

In base ai mandati di finanziamento

Coautori

Tong ZhangUIUCEmail verificata su tongzhang-ml.org
Hanze DongSalesforce ResearchEmail verificata su salesforce.com
Shizhe DiaoNVIDIA ResearchEmail verificata su nvidia.com
Jipeng ZhangHong Kong University of Science and TechnologyEmail verificata su connect.ust.hk
Han ZhongPeking UniversityEmail verificata su stu.pku.edu.cn
Chengshuai ShiBloomberg AI, University of VirginiaEmail verificata su virginia.edu
Nan JiangAssociate Professor of Computer Science, UIUCEmail verificata su illinois.edu
Rui PanUIUCEmail verificata su illinois.edu
Yong LinPrinceton UniversityEmail verificata su princeton.edu
Cong ShenAssociate Professor, University of VirginiaEmail verificata su virginia.edu
Hangyu LinFudan UniversityEmail verificata su fudan.edu.cn
Haoxiang WangResearch Scientist, NVIDIAEmail verificata su illinois.edu
Han ZhaoDepartment of Computer Science, University of Illinois Urbana-ChampaignEmail verificata su illinois.edu
Liwei WangProfessor, Peking UniversityEmail verificata su cis.pku.edu.cn
Chenlu YeComputer Science, University of Illinois Urbana-ChampaignEmail verificata su illinois.edu
Zhaoran WangAssociate Professor at Northwestern UniversityEmail verificata su northwestern.edu
Zhuoran YangYale UniversityEmail verificata su yale.edu
KaShun SHUMThe Hong Kong University of Science and TechnologyEmail verificata su connect.ust.hk
Sirui ZhengNorthwestern UniversityEmail verificata su u.northwestern.edu
Jing YangAssociate Professor of ECE and CS, University of VirginiaEmail verificata su virginia.edu

Segui

Wei Xiong

Altri nomi熊伟

Computer Science, University of Illinois Urbana-Champaign

Email verificata su illinois.edu - Home page

Learning Theory RLHF


Titolo Ordina per citazioni Ordina per anno Ordina per titolo	Citata da Citata da	Anno
Raft: Reward ranked finetuning for generative foundation model alignment H Dong, W Xiong, D Goyal, Z Yihan, C Winnie, R Pan, S Diao, J Zhang, ... TMLR, Selected to be presented at ICLR2025, 2023	326	2023
Iterative preference learning from human feedback: Bridging theory and practice for rlhf under kl-constraint W Xiong, H Dong, C Ye, Z Wang, H Zhong, H Ji, N Jiang, T Zhang ICML 2024, 2023	122*	2023
Mitigating the Alignment Tax of RLHF Y Lin, H Lin, W Xiong, S Diao, J Liu, J Zhang, R Pan, H Wang, W Hu, ... EMNLP 2024, 2023	102*	2023
RLHF Workflow: From Reward Modeling to Online RLHF H Dong, W Xiong, B Pang, H Wang, H Zhao, Y Zhou, N Jiang, D Sahoo, ... TMLR, 2024	76*	2024
Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts H Wang, W Xiong, T Xie, H Zhao, T Zhang EMNLP 2024, 2024	73	2024
A posterior sampling framework for interactive decision making H Zhong, W Xiong, S Zheng, L Wang, Z Wang, Z Yang, T Zhang arXiv preprint arXiv:2211.01962 2 (3), 2022	61*	2022
Lmflow: An extensible toolkit for finetuning and inference of large foundation models S Diao, R Pan, H Dong, KS Shum, J Zhang, W Xiong, T Zhang NAACL 2024, Best Demo Paper Award, 2023	52	2023
Nearly minimax optimal offline reinforcement learning with linear function approximation: Single-agent mdp and markov game W Xiong, H Zhong, C Shi, C Shen, L Wang, T Zhang ICLR 2023, 2022	49	2022
Pessimistic minimax value iteration: Provably efficient equilibrium learning from offline datasets H Zhong, W Xiong, J Tan, L Wang, T Zhang, Z Wang, Z Yang ICML 2022, 2022	49	2022
Decentralized multi-player multi-armed bandits with no collision information C Shi, W Xiong, C Shen, J Yang AISTATS 2020, 2020	44	2020
Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards H Wang, Y Lin, W Xiong, R Yang, S Diao, S Qiu, H Zhao, T Zhang ACL 2024, 2024	43	2024
Maximize to explore: One objective function fusing estimation, planning, and exploration Z Liu, M Lu, W Xiong, H Zhong, H Hu, S Zhang, S Zheng, Z Yang, Z Wang NeurIPS 2023, 2024	33*	2024
DPO Meets PPO: Reinforced Token Optimization for RLHF H Zhong, G Feng, W Xiong, L Zhao, D He, J Bian, L Wang arXiv preprint arXiv:2404.18922, 2024	31	2024
Online Iterative Reinforcement Learning from Human Feedback with General Preference Model C Ye, W Xiong, Y Zhang, N Jiang, T Zhang NeurIPS 2024, 2024	30*	2024
Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes C Ye, W Xiong, Q Gu, T Zhang ICML 2023, 2022	28	2022
A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games W Xiong, H Zhong, C Shi, C Shen, T Zhang ICML 2022, 2022	27	2022
Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization C Shi, W Xiong, C Shen, J Yang NeurIPS 2021, 2021	25	2021
Distributional reinforcement learning for multi-dimensional reward functions P Zhang, X Chen, L Zhao, W Xiong, T Qin, TY Liu NeurIPS 2021, 2021	24	2021
Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization R Pi, T Han, W Xiong, J Zhang, R Liu, R Pan, T Zhang ECCV 2024, 2024	21	2024
PMGT-VR: A decentralized proximal-gradient algorithmic framework with variance reduction H Ye, W Xiong, T Zhang arXiv preprint arXiv:2012.15010, 2020	17	2020

Il sistema al momento non può eseguire l'operazione. Riprova più tardi.

Articoli 1–20

Citazioni per anno

Citazioni duplicate

Citazioni unite

Aggiungi coautoriCoautori

Segui

Citata da

Coautori