GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models H Jin, R Chen, A Zhou, Y Zhang, H Wang arXiv preprint arXiv:2402.03299, 2024 | 24 | 2024 |
Jailbreakzoo: Survey, landscapes, and horizons in jailbreaking large language and vision-language models H Jin, L Hu, X Li, P Zhang, C Chen, J Zhuang, H Wang arXiv preprint arXiv:2407.01599, 2024 | 23 | 2024 |
ROBY: Evaluating the adversarial robustness of a deep model by its decision boundaries H Jin, J Chen, H Zheng, Z Wang, J Xiao, S Yu, Z Ming Information Sciences 587, 97-122, 2022 | 17 | 2022 |
CertPri: certifiable prioritization for deep neural networks via movement cost in feature space H Zheng, J Chen, H Jin 2023 38th IEEE/ACM International Conference on Automated Software …, 2023 | 12 | 2023 |
EditShield: Protecting Unauthorized Image Editing by Instruction-Guided Diffusion Models R Chen, H Jin, Y Liu, J Chen, H Wang, L Sun European Conference on Computer Vision, 126-142, 2024 | 10 | 2024 |
Excitement surfeited turns to errors: Deep learning testing framework based on excitable neurons H Jin, R Chen, H Zheng, J Chen, Y Cheng, Y Yu, T Chen, X Liu Information Sciences 637, 118936, 2023 | 8 | 2023 |
Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters H Jin, A Zhou, JD Menke, H Wang arXiv preprint arXiv:2405.20413, 2024 | 4 | 2024 |
DeepSensor: Deep Learning Testing Framework Based on Neuron Sensitivity H Jin, R Chen, H Zheng, J Chen, Z Liu, Q Xuan, Y Yu, Y Cheng arXiv preprint arXiv:2202.07464, 2022 | 2 | 2022 |
Quack: Automatic Jailbreaking Large Language Models via Role-playing H Jin, R Chen, J Chen, H Wang | 2 | |
Revolve: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization P Zhang, H Jin, L Hu, X Li, L Kang, M Luo, Y Song, H Wang arXiv preprint arXiv:2412.03092, 2024 | 1 | 2024 |
CatchBackdoor: Backdoor Testing by Critical Trojan Neural Path Identification via Differential Fuzzing H Jin, R Chen, J Chen, Y Cheng, C Fu, T Wang, Y Yu, Z Ming arXiv preprint arXiv:2112.13064, 2021 | 1 | 2021 |
NIP: Neuron-level Inverse Perturbation Against Adversarial Attacks R Chen, H Jin, J Chen, H Zheng, Y Yu, S Ji arXiv preprint arXiv:2112.13060, 2021 | 1 | 2021 |
CatchBackdoor: Backdoor Detection via Critical Trojan Neural Path Fuzzing H Jin, R Chen, J Chen, H Zheng, Y Zhang, H Wang European Conference on Computer Vision, 90-106, 2024 | | 2024 |
Fight Perturbations with Perturbations: Defending Adversarial Attacks via Neuron Influence R Chen, H Jin, H Zheng, J Chen, Z Liu IEEE Transactions on Dependable and Secure Computing, 2024 | | 2024 |
面向深度学习模型的可靠性测试综述 陈若曦, 金海波, 陈晋音, 郑海斌, 李晓豪 信息安全学报 9 (1), 33-55, 2024 | | 2024 |
AdvCheck: Characterizing adversarial examples via local gradient checking R Chen, H Jin, J Chen, H Zheng, S Zheng, X Yang, X Yang Computers & Security 136, 103540, 2024 | | 2024 |
GuardVal: Dynamic Large Language Model Jailbreak Evaluation for Comprehensive Safety Testing P Zhang, H Jin, L Kang, Y Song, H Wang | | |
Supplementary Material of CatchBackdoor: Backdoor Detection via Critical Trojan Neural Path Fuzzing H Jin, R Chen, J Chen, H Zheng, Y Zhang, H Wang ECCV 2024, 0 | | |