Følg
Ziniu Li
Ziniu Li
Andre navnZi-Niu Li
The Chinese University of Hong Kong, Shenzhen
Verifisert e-postadresse på link.cuhk.edu.cn - Startside
Tittel
Sitert av
Sitert av
År
Error bounds of imitating policies and environments
T Xu, Z Li, Y Yu
Neural Information Processing System (NeurIPS), 2020
110*2020
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models
Z Li, T Xu, Y Zhang, Z Lin, Y Yu, R Sun, ZQ Luo
International Conference on Machine Learning (ICML), 2024
52*2024
Error bounds of imitating policies and environments for reinforcement learning
T Xu, Z Li, Y Yu
IEEE Transactions on Pattern Analysis and Machine Intelligence 44 (10), 6968 …, 2021
432021
Adam-mini: Use fewer learning rates to gain more
Y Zhang, C Chen, Z Li, T Ding, C Wu, DP Kingma, Y Ye, ZQ Luo, R Sun
arXiv preprint arXiv:2406.16793, 2024
33*2024
Why transformers need adam: A hessian perspective
Y Zhang, C Chen, T Ding, Z Li, R Sun, ZQ Luo
Neural Information Processing System (NeurIPS), 2024
292024
Self-Guided Evolution Strategies with Historical Estimated Gradients
FY Liu, ZN Li, C Qian
International Joint Conferences on Artificial Intelligence (IJCAI), 2020
232020
When is RL better than DPO in RLHF? A Representation and Optimization Perspective
Z Li, T Xu, Y Yu
Tiny Paper of International Conference on Learning Representations (ICLR), 2024
22*2024
HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning
Z Li, Y Li, Y Zhang, T Zhang, ZQ Luo
International Conference on Learning Representations (ICLR), 2022
202022
Rethinking ValueDice - Does It Really Improve Performance?
Z Li, T Xu, Y Yu, ZQ Luo
Blog of International Conference on Learning Representations (ICLR), 2022
172022
Understanding adversarial imitation learning in small sample regime: A stage-coupled analysis
T Xu, Z Li, Y Yu, ZQ Luo
arXiv preprint arXiv:2208.01899, 2022
15*2022
Imitation learning from imperfection: Theoretical justifications and algorithms
Z Li, T Xu, Z Qin, Y Yu, ZQ Luo
Neural Information Processing System (NeurIPS), 2023
14*2023
Provably Efficient Adversarial Imitation Learning with Unknown Transitions
T Xu, Z Li, Y Yu, ZQ Luo
Conference on Uncertainty in Artificial Intelligence (UAI), 2023
92023
On the algorithmic bias of aligning large language models with rlhf: Preference collapse and matching regularization
J Xiao, Z Li, X Xie, E Getzen, C Fang, Q Long, WJ Su
arXiv preprint arXiv:2405.16455, 2024
82024
A Note on Target Q-learning For Solving Finite MDPs with A Generative Oracle
Z Li, T Xu, Y Yu
arXiv preprint arXiv:2203.11489, 2022
52022
Entropic distribution matching in supervised fine-tuning of LLMs: Less overfitting and better diversity
Z Li, C Chen, T Xu, Z Qin, J Xiao, R Sun, ZQ Luo
arXiv preprint arXiv:2408.16673, 2024
4*2024
Deploying offline reinforcement learning with human feedback
Z Li, K Xu, L Liu, L Li, D Ye, P Zhao
arXiv preprint arXiv:2303.07046, 2023
32023
Sensing jamming strategy from limited observations: An imitation learning perspective
Y Fan, B Jiu, W Pu, Z Li, K Li, H Liu
IEEE Transactions on Signal Processing, 2024
22024
Unlocking Black-Box Prompt Tuning Efficiency via Zeroth-Order Optimization
H Zhan, C Chen, T Ding, Z Li, R Sun
Findings of Conference on Empirical Methods in Natural Language Processing …, 2024
12024
Efficient Exploration by Novelty-Pursuit
Z Li, XH Chen
International Conference on Distributed Aritificial Intelligence (DAI), 2020
12020
RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques
Z Tang, Z Li, Z Xiao, T Ding, R Sun, B Wang, D Liu, F Huang, T Liu, B Yu, ...
arXiv preprint arXiv:2501.14492, 2025
2025
Systemet kan ikke utføre handlingen. Prøv på nytt senere.
Artikler 1–20