The rise and potential of large language model based agents: A survey Z Xi, W Chen, X Guo, W He, Y Ding, B Hong, M Zhang, J Wang, S Jin, ... Science China Information Sciences 68 (2), 121101, 2025 | 756 | 2025 |
Secrets of rlhf in large language models part i: Ppo R Zheng, S Dou, S Gao, Y Hua, W Shen, B Wang, Y Liu, S Jin, Q Liu, ... arXiv preprint arXiv:2307.04964, 2023 | 140* | 2023 |
Easyjailbreak: A unified framework for jailbreaking large language models W Zhou, X Wang, L Xiong, H Xia, Y Gu, M Chai, F Zhu, C Huang, S Dou, ... arXiv preprint arXiv:2403.12171, 2024 | 36 | 2024 |
MINER: Improving out-of-vocabulary named entity recognition from an information theoretic perspective X Wang, S Dou, L Xiong, Y Zou, Q Zhang, T Gui, L Qiao, Z Cheng, ... Proceedings of the 60th Annual Meeting of the Association for Computational …, 2022 | 32 | 2022 |
The rise and potential of large language model based agents: A survey, 2023 Z Xi, W Chen, X Guo, W He, Y Ding, B Hong, M Zhang, J Wang, S Jin, ... arXiv preprint arXiv:2309.07864, 2023 | 29 | 2023 |
The rise and potential of large language model based agents: A survey. arXiv 2023 Z Xi, W Chen, X Guo, W He, Y Ding, B Hong, M Zhang, J Wang, S Jin, ... arXiv preprint arXiv:2309.07864, 2023 | 22 | 2023 |
Zhiheng Xi W Zhou, X Wang, L Xiong, H Xia, Y Gu, M Chai, F Zhu, C Huang, S Dou Rui Zheng, Songyang Gao, Yicheng Zou, Hang Yan, Yifan Le, Ruohui Wang, Lijun …, 2024 | 18 | 2024 |
LoRAMoE: Alleviate world knowledge forgetting in large language models via MoE-style plugin S Dou, E Zhou, Y Liu, S Gao, J Zhao, W Shen, Y Zhou, Z Xi, X Wang, ... arXiv preprint arXiv:2312.09979, 2023 | 16 | 2023 |
Stepcoder: Improve code generation with reinforcement learning from compiler feedback S Dou, Y Liu, H Jia, L Xiong, E Zhou, W Shen, J Shan, C Huang, X Wang, ... arXiv preprint arXiv:2402.01391, 2024 | 12 | 2024 |
Zhiheng Xi, Xiaoran Fan, et al. 2024. Loramoe: Alleviating world knowledge forgetting in large language models via moe-style plugin S Dou, E Zhou, Y Liu, S Gao, W Shen, L Xiong, Y Zhou, X Wang Proceedings of the 62nd Annual Meeting of the Association for Computational …, 1932 | 10 | 1932 |
Zhiheng Xi, Yuhao Zhou, Tao Ji, Rui Zheng, Qi Zhang, Xuanjing Huang, and Tao Gui. Stepcoder: Improve code generation with reinforcement learning from compiler feedback. CoRR … S Dou, Y Liu, H Jia, L Xiong arXiv preprint ARXIV.2402.01391, 0 | 8 | |
Zhiheng Xi, et al. 2024. StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback S Dou, Y Liu, H Jia, L Xiong, E Zhou, J Shan, C Huang, W Shen, X Fan arXiv preprint arXiv:2402.01391, 2024 | 7 | 2024 |
The rise and potential of large language model based agents: a survey. CoRR abs/2309.07864 (2023) Z Xi, W Chen, X Guo, W He, Y Ding, B Hong, M Zhang, J Wang, S Jin, ... | 7 | 2023 |
Delve into PPO: Implementation matters for stable RLHF R Zheng, S Dou, S Gao, Y Hua, W Shen, B Wang, Y Liu, S Jin, Y Zhou, ... NeurIPS 2023 Workshop on Instruction Tuning and Instruction Following, 2023 | 6 | 2023 |
A Confidence-based Partial Label Learning Model for Crowd-Annotated Named Entity Recognition L Xiong, J Zhou, Q Zhu, X Wang, Y Wu, Q Zhang, T Gui, X Huang, J Ma, ... Findings of the Association for Computational Linguistics: ACL 2023, 2023 | 5 | 2023 |
Metarm: Shifted distributions alignment via meta-learning S Dou, Y Liu, E Zhou, T Li, H Jia, L Xiong, X Zhao, J Ye, R Zheng, T Gui, ... arXiv preprint arXiv:2405.00438, 2024 | 2 | 2024 |
RMB: Comprehensively Benchmarking Reward Models in LLM Alignment E Zhou, G Zheng, B Wang, Z Xi, S Dou, R Bao, W Shen, L Xiong, J Fan, ... arXiv preprint arXiv:2410.09893, 2024 | 1 | 2024 |
Multi-Programming Language Sandbox for LLMs S Dou, J Zhang, J Zang, Y Tao, W Zhou, H Jia, S Liu, Y Yang, Z Xi, S Wu, ... arXiv preprint arXiv:2410.23074, 2024 | | 2024 |