AutoDAN: Generating stealthy jailbreak prompts on aligned large language models X Liu, N Xu, M Chen, C Xiao ICLR 2024, 2023 | 363 | 2023 |
Protecting facial privacy: Generating adversarial identity masks via style-robust makeup transfer S Hu, X Liu, Y Zhang, M Li, LY Zhang, H Jin, L Wu CVPR 2022, 15014-15023, 2022 | 117 | 2022 |
Don't Listen To Me: Understanding and Exploring Jailbreak Prompts of Large Language Models Z Yu, X Liu, S Liang, Z Cameron, C Xiao, N Zhang 33rd USENIX Security Symposium (USENIX Security 24) - Distinguished Paper Award, 2024 | 63 | 2024 |
Jailbreakv-28k: A benchmark for assessing the robustness of multimodal large language models against jailbreak attacks W Luo, S Ma, X Liu, X Guo, C Xiao COLM 2024, 2024 | 46 | 2024 |
Detecting backdoors during the inference stage based on corruption robustness consistency X Liu, M Li, H Wang, S Hu, D Ye, H Jin, L Wu, C Xiao CVPR 2023, 16363-16372, 2023 | 36 | 2023 |
Advhash: Set-to-set targeted attack on deep hashing with one single adversarial patch S Hu, Y Zhang, X Liu, LY Zhang, M Li, H Jin ACM MM 2021, 2335-2343, 2021 | 36 | 2021 |
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding F Wang, X Fu, JY Huang, Z Li, Q Liu, X Liu, MD Ma, N Xu, W Zhou, ... ICLR 2025, 2024 | 33 | 2024 |
Adashield: Safeguarding multimodal large language models from structure-based attack via adaptive shield prompting Y Wang, X Liu, Y Li, M Chen, C Xiao ECCV 2024, 77–94, 2024 | 33 | 2024 |
Automatic and universal prompt injection attacks against large language models X Liu, Z Yu, Y Zhang, N Zhang, C Xiao arXiv preprint arXiv:2403.04957, 2024 | 30 | 2024 |
Why does little robustness help? a further step towards understanding adversarial transferability Y Zhang, S Hu, LY Zhang, J Shi, M Li, X Liu, W Wan, H Jin 2024 IEEE Symposium on Security and Privacy (SP), 3365-3384, 2024 | 24 | 2024 |
Deceptprompt: Exploiting llm-driven code generation via adversarial natural language instructions F Wu, X Liu, C Xiao arXiv preprint arXiv:2312.04730, 2023 | 24 | 2023 |
Towards efficient data-centric robust machine learning with noise-based augmentation X Liu, H Wang, Y Zhang, F Wu, S Hu arXiv preprint arXiv:2203.03810, 2022 | 17 | 2022 |
Pointcrt: Detecting backdoor in 3d point cloud via corruption robustness S Hu, W Liu, M Li, Y Zhang, X Liu, X Wang, LY Zhang, J Hou ACM MM 2023, 666-675, 2023 | 16 | 2023 |
Visual-RolePlay: Universal Jailbreak Attack on MultiModal Large Language Models via Role-playing Image Character S Ma, W Luo, Y Wang, X Liu, M Chen, B Li, C Xiao arXiv preprint arXiv:2405.20773, 2024 | 13 | 2024 |
AutoDAN-Turbo: A lifelong agent for strategy self-exploration to jailbreak llms X Liu, P Li, E Suh, Y Vorobeychik, Z Mao, S Jha, P McDaniel, H Sun, B Li, ... ICLR 2025, 2024 | 5 | 2024 |
InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models H Li, X Liu arXiv preprint arXiv:2410.22770, 2024 | 2 | 2024 |
RePD: Defending Jailbreak Attack through a Retrieval-based Prompt Decomposition Process P Wang, X Liu, C Xiao NAACL 2025 Findings, 2024 | | 2024 |