Zhihan Liu

Citée par

	Toutes	Depuis 2020
Citations	193	193
indice h	7	7
indice i10	6	6

140

105

20222023202420258 29 140 14

Accès public

Tout afficher

4 articles

0 article

disponibles

non disponibles

Sur la base des exigences liées au financement

Coauteurs

Zhaoran WangAssociate Professor at Northwestern UniversityAdresse e-mail validée de northwestern.edu
Shenao ZhangNorthwestern UniversityAdresse e-mail validée de gatech.edu
Hongyi GuoNorthwestern UniversityAdresse e-mail validée de u.northwestern.edu
Boyi LiuNorthwestern UniversityAdresse e-mail validée de u.northwestern.edu
Miao LuStanford UniversityAdresse e-mail validée de stanford.edu
Han ZhongPeking UniversityAdresse e-mail validée de stu.pku.edu.cn
Yufeng ZhangPh.D. Student, Northwestern UniversityAdresse e-mail validée de u.northwestern.edu
Zhuoran YangYale UniversityAdresse e-mail validée de yale.edu
Wei XiongComputer Science, University of Illinois Urbana-ChampaignAdresse e-mail validée de illinois.edu
Zuyue FuNorthwestern UniversityAdresse e-mail validée de u.northwestern.edu
Jose BlanchetStanford UniversityAdresse e-mail validée de stanford.edu
Michael I. JordanProfessor of Electrical Engineering and Computer Sciences and Professor of Statistics, UC BerkeleyAdresse e-mail validée de cs.berkeley.edu
Zhenghai XuePh.D. student at Nanyang Technological University, SingaporeAdresse e-mail validée de e.ntu.edu.sg
Yingxiang YangAdresse e-mail validée de illinois.edu
Nuoya XiongCarnegie Mellon UniversityAdresse e-mail validée de andrew.cmu.edu

Suivre

Zhihan Liu

Northwestern University

Adresse e-mail validée de u.northwestern.edu

large language models reinforcement learning offline learning online learning


Titre Trier par citations Trier par année Trier par titre	Citée par Citée par	Année
Reason for future, act for now: A principled architecture for autonomous llm agents Z Liu, H Hu, S Zhang, H Guo, S Ke, B Liu, Z Wang Forty-first International Conference on Machine Learning, 2023	40*	2023
Maximize to explore: One objective function fusing estimation, planning, and exploration Z Liu, M Lu, W Xiong, H Zhong, H Hu, S Zhang, S Zheng, Z Yang, Z Wang Advances in Neural Information Processing Systems 36, 2024	33*	2024
Provably mitigating overoptimization in rlhf: Your sft loss is implicitly an adversarial regularizer Z Liu, M Lu, S Zhang, B Liu, H Guo, Y Yang, J Blanchet, Z Wang arXiv preprint arXiv:2405.16436, 2024	25	2024
Learning from demonstration: Provably efficient adversarial policy imitation with linear function approximation Z Liu, Y Zhang, Z Fu, Z Yang, Z Wang International conference on machine learning, 14094-14138, 2022	25*	2022
Welfare maximization in competitive equilibrium: Reinforcement learning for markov exchange economy Z Liu, M Lu, Z Wang, M Jordan, Z Yang International Conference on Machine Learning, 13870-13911, 2022	23	2022
Self-exploring language models: Active preference elicitation for online alignment S Zhang, D Yu, H Sharma, H Zhong, Z Liu, Z Yang, S Wang, H Hassan, ... arXiv preprint arXiv:2405.19332, 2024	20	2024
Guarded policy optimization with imperfect online demonstrations Z Xue, Z Peng, Q Li, Z Liu, B Zhou arXiv preprint arXiv:2303.01728, 2023	8	2023
Can large language models play games? a case study of a self-play approach H Guo, Z Liu, Y Zhang, Z Wang arXiv preprint arXiv:2403.05632, 2024	7	2024
How Can LLM Guide RL? A Value-Based Approach S Zhang, S Zheng, S Ke, Z Liu, W Jin, J Yuan, Y Yang, H Yang, Z Wang arXiv preprint arXiv:2402.16181, 2024	6	2024
Sample-efficient multi-agent rl: An optimization perspective N Xiong, Z Liu, Z Wang, Z Yang arXiv preprint arXiv:2310.06243, 2023	2	2023
Reward-Augmented Data Enhances Direct Preference Alignment of LLMs S Zhang, Z Liu, B Liu, Y Zhang, Y Yang, Y Liu, L Chen, T Sun, Z Wang arXiv preprint arXiv:2410.08067, 2024	1	2024
Just Say What You Want: Only-prompting Self-rewarding Online Preference Optimization R Xu, Z Liu, Y Liu, S Yan, Z Wang, Z Zhang, X He arXiv preprint arXiv:2409.17534, 2024	1	2024
Toward Optimal LLM Alignments Using Two-Player Games R Zheng, H Guo, Z Liu, X Zhang, Y Yao, X Xu, Z Wang, Z Xi, T Gui, ... arXiv preprint arXiv:2406.10977, 2024	1	2024
A Principled Framework for Knowledge-enhanced Large Language Model S Wang, Z Liu, Z Wang, J Guo arXiv preprint arXiv:2311.11135, 2023	1	2023
Hindsight Planner: A Closed-Loop Few-Shot Planner for Embodied Instruction Following Y Yang, S Zhang, Z Liu, H Yao, Z Wang arXiv preprint arXiv:2412.19562, 2024		2024
DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs Z Liu, S Zhang, Y Liu, B Liu, Y Yang, Z Wang arXiv preprint arXiv:2411.13611, 2024		2024

Le système ne peut pas réaliser cette opération maintenant. Veuillez réessayer plus tard.

Articles 1–16

Nombre de citations par an

Citations en double

Citations fusionnées

Ajouter les coauteursCoauteurs

Suivre

Citée par

Coauteurs