Xiezhi: An ever-updating benchmark for holistic domain knowledge evaluation Z Gu, X Zhu, H Ye, L Zhang, J Wang, Y Zhu, S Jiang, Z Xiong, Z Li, W Wu, ... Proceedings of the AAAI Conference on Artificial Intelligence 38 (16), 18099 …, 2024 | 48 | 2024 |
Agent Group Chat: An Interactive Group Chat Simulacra For Better Eliciting Collective Emergent Behavior Z Gu, X Zhu, H Guo, L Zhang, Y Cai, H Shen, J Chen, Z Ye, Y Dai, Y Gao, ... arXiv preprint arXiv:2403.13433, 2024 | 7 | 2024 |
Sem4SAP: Synonymous Expression Mining from Open Knowledge Graph for Language Model Synonym-Aware Pretraining Z Gu, S Jiang, W Huang, J Liang, H Feng, Y Xiao arXiv preprint arXiv:2303.14425, 2023 | 4 | 2023 |
The missing piece in model editing: A deep dive into the hidden damage brought by model editing J Wang, Z Gu, Z Xiong, H Feng, Y Xiao arXiv preprint arXiv:2403.07825, 2024 | 3 | 2024 |
Beyond the Obvious: Evaluating the Reasoning Ability In Real-life Scenarios of Language Models on Life Scapes Reasoning Benchmark~(LSR-Benchmark) Z Gu, Z Li, L Zhang, Z Xiong, S Jiang, X Zhu, S Wang, Z Wang, J Wang, ... arXiv preprint arXiv:2307.05113, 2023 | 2 | 2023 |
LLM-GAN: Construct Generative Adversarial Network Through Large Language Models For Explainable Fake News Detection Y Wang, Z Gu, S Zhang, S Zheng, T Wang, T Li, H Feng, Y Xiao arXiv preprint arXiv:2409.01787, 2024 | 1 | 2024 |
GANTEE: Generative Adversarial Network for Taxonomy Enterance Evaluation Z Gu, S Jiang, J Liu, Y Xiao, H Feng, Z Li, J Liang, Z Jian Proceedings of the AAAI Conference on Artificial Intelligence 37 (5), 6380-6388, 2023 | 1 | 2023 |
Domain Mastery Benchmark: An Ever-Updating Benchmark for Evaluating Holistic Domain Knowledge of Large Language Model--A Preliminary Release Z Gu, X Zhu, H Ye, L Zhang, Z Xiong, Z Li, Q He, S Jiang, H Feng, Y Xiao arXiv preprint arXiv:2304.11679, 2023 | 1 | 2023 |
Enhancing Link Prediction Based on Simple Path Graphs Z Li, Y Cai, H Feng International Conference on Database Systems for Advanced Applications, 319-334, 2024 | | 2024 |
DetectBench: Can Large Language Model Detect and Piece Together Implicit Evidence? Z Gu, L Zhang, X Zhu, J Chen, W Huang, Y Zhang, S Wang, Z Ye, Y Gao, ... arXiv preprint arXiv:2406.12641, 2024 | | 2024 |
VCEval: Rethinking What is a Good Educational Video and How to Automatically Evaluate It X Zhu, Z Gu, S Jiang, Z Li, H Feng, Y Xiao arXiv preprint arXiv:2407.12005, 2024 | | 2024 |
StructBench: An Autogenerated Benchmark for Evaluating Large Language Model's Ability in Structure-Rich Text Understanding Z Gu, H Ye, Z Zhou, H Feng, Y Xiao arXiv preprint arXiv:2406.10621, 2024 | | 2024 |
Structure-Rich Text Benchmark for Knowledge Inference Evaluation H Ye, Z Gu, Z Zhou, S Jiang, H Feng, Y Xiao | | |