Cmmlu: Measuring massive multitask language understanding in chinese H Li, Y Zhang, F Koto, Y Yang, H Zhao, Y Gong, N Duan, T Baldwin arXiv preprint arXiv:2306.09212, 2023 | 195 | 2023 |
Confidence matters: Revisiting intrinsic self-correction capabilities of large language models L Li, Z Chen, G Chen, Y Zhang, Y Su, E Xing, K Zhang arXiv preprint arXiv:2402.12563, 2024 | 24 | 2024 |
Learning from failure: Integrating negative examples when fine-tuning large language models as agents R Wang, H Li, X Han, Y Zhang, T Baldwin arXiv preprint arXiv:2402.11651, 2024 | 18 | 2024 |
Against The Achilles' Heel: A Survey on Red Teaming for Generative Models L Lin, H Mu, Z Zhai, M Wang, Y Wang, R Wang, J Gao, Y Zhang, W Che, ... Journal of Artificial Intelligence Research 82, 687-775, 2025 | 14 | 2025 |
Can Large Language Model Comprehend Ancient Chinese? A Preliminary Test on ACLUE Y Zhang, H Li Ancient Language Processing Workshop, 2023, 2023 | 12 | 2023 |
Causal Representation Learning from Multimodal Biological Observations Y Sun, L Kong, G Chen, L Li, G Luo, Z Li, Y Zhang, Y Zheng, M Yang, ... arXiv preprint arXiv:2411.06518, 2024 | | 2024 |