Suivre
Zhihan Liu
Titre
Citée par
Citée par
Année
Reason for future, act for now: A principled architecture for autonomous llm agents
Z Liu, H Hu, S Zhang, H Guo, S Ke, B Liu, Z Wang
Forty-first International Conference on Machine Learning, 2023
40*2023
Maximize to explore: One objective function fusing estimation, planning, and exploration
Z Liu, M Lu, W Xiong, H Zhong, H Hu, S Zhang, S Zheng, Z Yang, Z Wang
Advances in Neural Information Processing Systems 36, 2024
33*2024
Provably mitigating overoptimization in rlhf: Your sft loss is implicitly an adversarial regularizer
Z Liu, M Lu, S Zhang, B Liu, H Guo, Y Yang, J Blanchet, Z Wang
arXiv preprint arXiv:2405.16436, 2024
252024
Learning from demonstration: Provably efficient adversarial policy imitation with linear function approximation
Z Liu, Y Zhang, Z Fu, Z Yang, Z Wang
International conference on machine learning, 14094-14138, 2022
25*2022
Welfare maximization in competitive equilibrium: Reinforcement learning for markov exchange economy
Z Liu, M Lu, Z Wang, M Jordan, Z Yang
International Conference on Machine Learning, 13870-13911, 2022
232022
Self-exploring language models: Active preference elicitation for online alignment
S Zhang, D Yu, H Sharma, H Zhong, Z Liu, Z Yang, S Wang, H Hassan, ...
arXiv preprint arXiv:2405.19332, 2024
202024
Guarded policy optimization with imperfect online demonstrations
Z Xue, Z Peng, Q Li, Z Liu, B Zhou
arXiv preprint arXiv:2303.01728, 2023
82023
Can large language models play games? a case study of a self-play approach
H Guo, Z Liu, Y Zhang, Z Wang
arXiv preprint arXiv:2403.05632, 2024
72024
How Can LLM Guide RL? A Value-Based Approach
S Zhang, S Zheng, S Ke, Z Liu, W Jin, J Yuan, Y Yang, H Yang, Z Wang
arXiv preprint arXiv:2402.16181, 2024
62024
Sample-efficient multi-agent rl: An optimization perspective
N Xiong, Z Liu, Z Wang, Z Yang
arXiv preprint arXiv:2310.06243, 2023
22023
Reward-Augmented Data Enhances Direct Preference Alignment of LLMs
S Zhang, Z Liu, B Liu, Y Zhang, Y Yang, Y Liu, L Chen, T Sun, Z Wang
arXiv preprint arXiv:2410.08067, 2024
12024
Just Say What You Want: Only-prompting Self-rewarding Online Preference Optimization
R Xu, Z Liu, Y Liu, S Yan, Z Wang, Z Zhang, X He
arXiv preprint arXiv:2409.17534, 2024
12024
Toward Optimal LLM Alignments Using Two-Player Games
R Zheng, H Guo, Z Liu, X Zhang, Y Yao, X Xu, Z Wang, Z Xi, T Gui, ...
arXiv preprint arXiv:2406.10977, 2024
12024
A Principled Framework for Knowledge-enhanced Large Language Model
S Wang, Z Liu, Z Wang, J Guo
arXiv preprint arXiv:2311.11135, 2023
12023
Hindsight Planner: A Closed-Loop Few-Shot Planner for Embodied Instruction Following
Y Yang, S Zhang, Z Liu, H Yao, Z Wang
arXiv preprint arXiv:2412.19562, 2024
2024
DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs
Z Liu, S Zhang, Y Liu, B Liu, Y Yang, Z Wang
arXiv preprint arXiv:2411.13611, 2024
2024
Le système ne peut pas réaliser cette opération maintenant. Veuillez réessayer plus tard.
Articles 1–16