Large language model alignment: A survey T Shen, R Jin, Y Huang, C Liu, W Dong, Z Guo, X Wu, Y Liu, D Xiong arXiv preprint arXiv:2309.15025, 2023 | 165 | 2023 |
Evaluating large language models: A comprehensive survey Z Guo, R Jin, C Liu, Y Huang, D Shi, L Yu, Y Liu, J Li, B Xiong, D Xiong arXiv preprint arXiv:2310.19736, 2023 | 140 | 2023 |
Supryadi, Linhao Yu, Yan Liu, Jiaxuan Li, Bojian Xiong, and Deyi Xiong. 2023. Evaluating Large Language Models: A Comprehensive Survey Z Guo, R Jin, C Liu, Y Huang, D Shi arXiv preprint arXiv:2310.19736, 2023 | 37 | 2023 |
M3ke: A massive multi-level multi-subject knowledge evaluation benchmark for chinese large language models C Liu, R Jin, Y Ren, L Yu, T Dong, X Peng, S Zhang, J Peng, P Zhang, ... arXiv preprint arXiv:2305.10263, 2023 | 27 | 2023 |
Bert with enhanced layer for assistant diagnosis based on Chinese obstetric EMRs K Zhang, C Liu, X Duan, L Zhou, Y Zhao, H Zan 2019 International Conference on Asian Language Processing (IALP), 384-389, 2019 | 7 | 2019 |
Hisbert for conversational reading comprehension C Liu, D Xiong, Y Jia, H Zan, C Hu 2020 International Conference on Asian Language Processing (IALP), 147-152, 2020 | 5 | 2020 |
Openeval: benchmarking Chinese LLMs across capability, alignment and safety C Liu, L Yu, J Li, R Jin, Y Huang, L Shi, J Zhang, X Ji, T Cui, T Liu, J Song, ... arXiv preprint arXiv:2403.12316, 2024 | 4 | 2024 |
Evaluating Chinese large language models on discipline knowledge acquisition via memorization and robustness assessment C Liu, R Jin, M Steedman, D Xiong Proceedings of the 1st Workshop on Data Contamination (CONDA), 1-12, 2024 | 3 | 2024 |
Tab-CQA: A tabular conversational question answering dataset on financial reports C Liu, J Li, D Xiong Proceedings of the 61st Annual Meeting of the Association for Computational …, 2023 | 3 | 2023 |
Large language model safety: A holistic survey D Shi, T Shen, Y Huang, Z Li, Y Leng, R Jin, C Liu, X Wu, Z Guo, L Yu, ... arXiv preprint arXiv:2412.17686, 2024 | 2 | 2024 |
Lhmke: A large-scale holistic multi-subject knowledge evaluation benchmark for chinese large language models C Liu, R Jin, Y Ren, D Xiong arXiv preprint arXiv:2403.12601, 2024 | 2 | 2024 |
Empirical Study on Data Attributes Insufficiency of Evaluation Benchmarks for LLMs C Liu, R Jin, Z Yao, T Li, L Cheng, M Steedman, D Xiong Proceedings of the 31st International Conference on Computational …, 2025 | | 2025 |
[Industry] Tab-CQA: A Tabular Conversational Question Answering Dataset on Financial Reports C Liu, J Li, D Xiong The 61st Annual Meeting Of The Association For Computational Linguistics, 2023 | | 2023 |
TGEA 2.0: a large-scale diagnostically annotated dataset with benchmark tasks for text generation of pretrained language models H Ge, X Zhao, C Liu, Y Zeng, Q Liu, D Xiong Advances in Neural Information Processing Systems 35, 31612-31626, 2022 | | 2022 |
TG Network: A Model that More Effectively Identifies the Use of the Auxiliary Word “DE” C Liu, H Zan, X Duan, K Zhang, Y Han Chinese Lexical Semantics: 20th Workshop, CLSW 2019, Beijing, China, June 28 …, 2020 | | 2020 |