Chatglm: A family of large language models from glm-130b to glm-4 all tools T GLM, A Zeng, B Xu, B Wang, C Zhang, D Yin, D Zhang, D Rojas, G Feng, ...
arXiv preprint arXiv:2406.12793, 2024
380 * 2024 Survey on factuality in large language models: Knowledge, retrieval and domain-specificity C Wang, X Liu, Y Yue, X Tang, T Zhang, C Jiayang, Y Yao, W Gao, X Hu, ...
arXiv preprint arXiv:2310.07521, 2023
186 2023 Knowledge conflicts for llms: A survey R Xu, Z Qi, Z Guo, C Wang, H Wang, Y Zhang, W Xu
arXiv preprint arXiv:2403.08319, 2024
58 2024 Visualagentbench: Towards large multimodal models as visual foundation agents X Liu, T Zhang, Y Gu, IL Iong, Y Xu, X Song, S Zhang, H Lai, X Liu, H Zhao, ...
arXiv preprint arXiv:2408.06327, 2024
19 * 2024 Naturalcodebench: Examining coding performance mismatch on humaneval and natural user prompts S Zhang, H Zhao, X Liu, Q Zheng, Z Qi, X Gu, X Zhang, Y Dong, J Tang
arXiv preprint arXiv:2405.04520, 2024
14 * 2024 Mr-ben: A meta-reasoning benchmark for evaluating system-2 thinking in llms Z Zeng, Y Liu, Y Wan, J Li, P Chen, J Dai, Y Yao, R Xu, Z Qi, W Zhao, ...
arXiv preprint arXiv:2406.13975, 2024
12 * 2024 Walking in Others' Shoes: How Perspective-Taking Guides Large Language Models in Reducing Toxicity and Bias R Xu, Z Zhou, T Zhang, Z Qi, S Yao, K Xu, W Xu, H Qiu
arXiv preprint arXiv:2407.15366, 2024
8 2024 Preemptive answer" attacks" on chain-of-thought reasoning R Xu, Z Qi, W Xu
arXiv preprint arXiv:2405.20902, 2024
6 2024 Survey on factuality in large language models: knowledge, retrieval and domain-specificity (2023) C Wang, X Liu, Y Yue, X Tang, T Zhang, C Jiayang, Y Yao, W Gao, X Hu, ...
Cited on, 1, 0
5 Debateqa: Evaluating question answering on debatable knowledge R Xu, X Qi, Z Qi, W Xu, Z Guo
arXiv preprint arXiv:2408.01419, 2024
4 2024 Autoglm: Autonomous foundation agents for guis X Liu, B Qin, D Liang, G Dong, H Lai, H Zhang, H Zhao, IL Iong, J Sun, ...
arXiv preprint arXiv:2411.00820, 2024
3 2024 Bias and Volatility: A Statistical Framework for Evaluating Large Language Model's Stereotypes and the Associated Generation Inconsistency Y Liu, K Yang, Z Qi, X Liu, Y Yu, CX Zhai
The Thirty-eight Conference on Neural Information Processing Systems …, 0
3 * WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning Z Qi, X Liu, IL Iong, H Lai, X Sun, W Zhao, Y Yang, X Yang, J Sun, S Yao, ...
arXiv preprint arXiv:2411.02337, 2024
1 2024 Long RAG: Evaluating Long-Context & Long-Form Retrieval-Augmented Generation with Key Point Recall Z Qi, R Xu, Z Guo, C Wang, H Zhang, W Xu
arXiv preprint arXiv:2410.23000, 2024
2024