A survey on evaluation of large language models Y Chang, X Wang, J Wang, Y Wu, L Yang, K Zhu, H Chen, X Yi, C Wang, ... ACM Transactions on Intelligent Systems and Technology 15 (3), 1-45, 2024 | 2134 | 2024 |
Survey on factuality in large language models: Knowledge, retrieval and domain-specificity C Wang, X Liu, Y Yue, X Tang, T Zhang, C Jiayang, Y Yao, W Gao, X Hu, ... arXiv preprint arXiv:2310.07521, 2023 | 186 | 2023 |
Pandalm: An automatic evaluation benchmark for llm instruction tuning optimization Y Wang, Z Yu, Z Zeng, L Yang, C Wang, H Chen, C Jiang, R Xie, J Wang, ... ICLR 2024, 2023 | 183 | 2023 |
Does It Make Sense? And Why? A Pilot Study for Sense Making and Explanation C Wang, S Liang, Y Zhang, X Li, T Gao ACL 2019, 4020–4026, 2019 | 120 | 2019 |
SemEval-2020 task 4: Commonsense validation and explanation C Wang, S Liang, Y Jin, Y Wang, X Zhu, Y Zhang SemEval-2020 Task track, 2020 | 112 | 2020 |
Can generative pre-trained language models serve as knowledge bases for closed-book qa? C Wang, P Liu, Y Zhang ACL 2021, 2021 | 84 | 2021 |
A survey on evaluation of large language models. arXiv Y Chang, X Wang, J Wang, Y Wu, L Yang, K Zhu, H Chen, X Yi, C Wang, ... Preprint posted online on Dec 29, 2023 | 69* | 2023 |
Knowledge conflicts for llms: A survey R Xu, Z Qi, Z Guo, C Wang, H Wang, Y Zhang, W Xu arXiv preprint arXiv:2403.08319, 2024 | 51 | 2024 |
Evaluating Open-QA Evaluation C Wang, S Cheng, Q Guo, Y Yue, B Ding, Z Xu, Y Wang, X Hu, Z Zhang, ... Advances in Neural Information Processing Systems 36, 2023 | 51* | 2023 |
A survey on evaluation of large language models (2023) Y Chang, X Wang, J Wang, Y Wu, L Yang, K Zhu, H Chen, X Yi, C Wang, ... | 31* | |
Llms with chain-of-thought are non-causal reasoners G Bao, H Zhang, L Yang, C Wang, Y Zhang arXiv preprint arXiv:2402.16048, 2024 | 24 | 2024 |
A survey on evaluation of large language models. arXiv Y Chang, X Wang, J Wang, Y Wu, L Yang, K Zhu, H Chen, X Yi, C Wang, ... Preprint posted online on Dec 29, 2023 | 24 | 2023 |
Exploring generalization ability of pretrained language models on arithmetic and logical reasoning C Wang, B Zheng, Y Niu, Y Zhang Natural Language Processing and Chinese Computing: 10th CCF International …, 2021 | 21 | 2021 |
RFiD: Towards Rational Fusion-in-Decoder for Open-Domain Question Answering C Wang, H Yu, Y Zhang Findings of the Association for Computational Linguistics: ACL 2023, 2023 | 18 | 2023 |
A survey on evaluation of large language models. arXiv 2023 Y Chang, X Wang, J Wang, Y Wu, K Zhu, H Chen, L Yang, X Yi, C Wang, ... arXiv preprint arXiv:2307.03109 10, 2023 | 16* | 2023 |
Self-DC: When to retrieve and When to generate? Self Divide-and-Conquer for Compositional Unknown Questions H Wang, B Xue, B Zhou, T Zhang, C Wang, G Chen, H Wang, K Wong arXiv preprint arXiv:2402.13514, 2024 | 13 | 2024 |
PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization Y Wang, Z Yu, Z Zeng, L Yang, C Wang, H Chen, C Jiang, R Xie, J Wang, ... | 13* | 2023 |
SHIELD: Evaluation and Defense Strategies for Copyright Compliance in LLM Text Generation X Liu, T Sun, T Xu, F Wu, C Wang, X Wang, J Gao arXiv preprint arXiv:2406.12975, 2024 | 10 | 2024 |
Novelqa: A benchmark for long-range novel question answering C Wang, R Ning, B Pan, T Wu, Q Guo, C Deng, G Bao, Q Wang, Y Zhang arXiv preprint arXiv:2403.12766, 2024 | 10 | 2024 |
Ragchecker: A fine-grained framework for diagnosing retrieval-augmented generation D Ru, L Qiu, X Hu, T Zhang, P Shi, S Chang, C Jiayang, C Wang, S Sun, ... arXiv preprint arXiv:2408.08067, 2024 | 8 | 2024 |