Agentbench: Evaluating llms as agents X Liu, H Yu, H Zhang, Y Xu, X Lei, H Lai, Y Gu, H Ding, K Men, K Yang, ... ICLR 2024, 2023 | 392* | 2023 |
Safetybench: Evaluating the safety of large language models with multiple choice questions Z Zhang, L Lei, L Wu, R Sun, Y Huang, C Long, X Liu, X Lei, J Tang, ... arXiv preprint arXiv:2309.07045, 2023 | 131 | 2023 |
AlignBench: Benchmarking Chinese Alignment of Large Language Models X Liu*, X Lei*, S Wang, Y Huang, Z Feng, B Wen, J Cheng, P Ke, Y Xu, ... ACL 2024, 2023 | 50 | 2023 |
CritiqueLLM: Towards an informative critique generation model for evaluation of large language model generation P Ke, B Wen, A Feng, X Liu, X Lei, J Cheng, S Wang, A Zeng, Y Dong, ... Proceedings of the 62nd Annual Meeting of the Association for Computational …, 2024 | 45* | 2024 |
Scaffolding coordinates to promote vision-language coordination in large multi-modal models X Lei, Z Yang, X Chen, P Li, Y Liu COLING 2025, ACL 2024 Wordplay Workshop, 2024 | 19 | 2024 |
XDAI: A tuning-free framework for exploiting pre-trained language models in knowledge grounded dialogue generation J Yu, X Zhang, Y Xu, X Lei, X Guan, J Zhang, L Hou, J Li, J Tang Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and …, 2022 | 15 | 2022 |
A Cause-Effect Look at Alleviating Hallucination of Knowledge-grounded Dialogue Generation J Yu, X Zhang, Y Xu, X Lei, Z Yao, J Zhang, L Hou, J Li COLING 2024, 2024 | 1 | 2024 |
AIGS: Generating Science from AI-Powered Automated Falsification Z Liu*, K Liu*, Y Zhu*, X Lei*, Z Yang*, Z Zhang, P Li, Y Liu arXiv preprint arXiv:2411.11910, 2024 | | 2024 |