Generative pretraining in multimodality Q Sun*, Q Yu*, Y Cui*, F Zhang*, X Zhang*, Y Wang, H Gao, J Liu, ... ICLR, 2024 | 230* | 2024 |
Exploring the universal vulnerability of prompt-based learning paradigm L Xu, Y Chen, G Cui, H Gao, Z Liu Findings of NAACL, 2022 | 86 | 2022 |
Revisiting Out-of-distribution Robustness in NLP: Benchmarks, Analysis, and LLMs Evaluations L Yuan, Y Chen, G Cui, H Gao, F Zou, X Cheng, H Ji, Z Liu, M Sun NeurIPS (Dataset and Benchmark Track) 36, 2023 | 81 | 2023 |
Why should adversarial perturbations be imperceptible? rethink the research paradigm in adversarial NLP Y Chen*, H Gao*, G Cui, F Qi, L Huang, Z Liu, M Sun EMNLP, 2022 | 43 | 2022 |
Efficient detection of LLM-generated texts with a Bayesian surrogate model Y Miao*, H Gao*, H Zhang, Z Deng Findings of ACL, 2024 | 21 | 2024 |
Evaluating the robustness of text-to-image diffusion models against real-world attacks H Gao, H Zhang, Y Dong, Z Deng arXiv preprint arXiv:2306.13103, 2023 | 21 | 2023 |
Textual backdoor attacks can be more harmful via two simple tricks Y Chen*, F Qi*, H Gao, Z Liu, M Sun EMNLP, 2022 | 21 | 2022 |
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? R Cao, F Lei, H Wu, J Chen, Y Fu, H Gao, X Xiong, H Zhang, Y Mao, W Hu, ... NeurIPS (Dataset and Benchmark Track), 2024 | 19* | 2024 |
Kimi k1. 5: Scaling reinforcement learning with llms K Team, A Du, B Gao, B Xing, C Jiang, C Chen, C Li, C Xiao, C Du, C Liao, ... arXiv preprint arXiv:2501.12599, 2025 | 13 | 2025 |
Universal Prompt Optimizer for Safe Text-to-Image Generation Z Wu*, H Gao*, Y Wang, X Zhang, S Wang NAACL, 2024 | 13 | 2024 |
Is factuality decoding a free lunch for llms? evaluation on knowledge editing benchmark B Bi, S Liu, Y Wang, L Mei, J Fang, H Gao, S Ni, X Cheng ICLR, 2025 | 8* | 2025 |
Struedit: Structured outputs enable the fast and accurate knowledge editing for large language models B Bi, S Liu, Y Wang, L Mei, H Gao, J Fang, X Cheng arXiv preprint arXiv:2409.10132, 2024 | 8 | 2024 |
Spider 2.0: Evaluating language models on real-world enterprise text-to-sql workflows F Lei, J Chen, Y Ye, R Cao, D Shin, H Su, Z Suo, H Gao, W Hu, P Yin, ... ICLR, 2025 | 6 | 2025 |
AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models Z Zeng*, Y Miao*, H Gao, H Zhang, Z Deng Findings of EMNLP, 2024 | 6 | 2024 |
Adaptive Token Biaser: Knowledge Editing via Biasing Key Entities B Bi, S Liu, Y Wang, L Mei, H Gao, Y Xu, X Cheng Findings of EMNLP, 2024 | 5 | 2024 |
From adversarial arms race to model-centric evaluation: Motivating a unified automatic robustness evaluation framework Y Chen*, H Gao*, G Cui*, L Yuan, D Kong, H Wu, N Shi, B Yuan, L Huang, ... Findings of ACL, 2023 | 5 | 2023 |
Meta-Unlearning on Diffusion Models: Preventing Relearning Unlearned Concepts H Gao*, T Pang*, C Du, T Hu, Z Deng, M Lin arXiv preprint arXiv:2410.12777, 2024 | 4 | 2024 |
Leveraging Catastrophic Forgetting to Develop Safe Diffusion Models against Malicious Finetuning J Pan*, H Gao*, Z Wu, T Hu, L Su, Q Huang, L Li NeurIPS, 2024 | 2* | 2024 |
GuardReasoner: Towards Reasoning-based LLM Safeguards Y Liu, H Gao, S Zhai, J Xia, T Wu, Z Xue, Y Chen, K Kawaguchi, J Zhang, ... arXiv preprint arXiv:2501.18492, 2025 | 1 | 2025 |
SafeCFG: Redirecting Harmful Classifier-Free Guidance for Safe Generation J Pan, H Gao, L Li, ZJ Zha, Q Huang, J Luo arXiv preprint arXiv:2412.16039, 2024 | | 2024 |