팔로우
Haibo Jin
Haibo Jin
illinois.edu의 이메일 확인됨 - 홈페이지
제목
인용
인용
연도
GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models
H Jin, R Chen, A Zhou, Y Zhang, H Wang
arXiv preprint arXiv:2402.03299, 2024
242024
Jailbreakzoo: Survey, landscapes, and horizons in jailbreaking large language and vision-language models
H Jin, L Hu, X Li, P Zhang, C Chen, J Zhuang, H Wang
arXiv preprint arXiv:2407.01599, 2024
232024
ROBY: Evaluating the adversarial robustness of a deep model by its decision boundaries
H Jin, J Chen, H Zheng, Z Wang, J Xiao, S Yu, Z Ming
Information Sciences 587, 97-122, 2022
172022
CertPri: certifiable prioritization for deep neural networks via movement cost in feature space
H Zheng, J Chen, H Jin
2023 38th IEEE/ACM International Conference on Automated Software …, 2023
122023
EditShield: Protecting Unauthorized Image Editing by Instruction-Guided Diffusion Models
R Chen, H Jin, Y Liu, J Chen, H Wang, L Sun
European Conference on Computer Vision, 126-142, 2024
102024
Excitement surfeited turns to errors: Deep learning testing framework based on excitable neurons
H Jin, R Chen, H Zheng, J Chen, Y Cheng, Y Yu, T Chen, X Liu
Information Sciences 637, 118936, 2023
82023
Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters
H Jin, A Zhou, JD Menke, H Wang
arXiv preprint arXiv:2405.20413, 2024
42024
DeepSensor: Deep Learning Testing Framework Based on Neuron Sensitivity
H Jin, R Chen, H Zheng, J Chen, Z Liu, Q Xuan, Y Yu, Y Cheng
arXiv preprint arXiv:2202.07464, 2022
22022
Quack: Automatic Jailbreaking Large Language Models via Role-playing
H Jin, R Chen, J Chen, H Wang
2
Revolve: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization
P Zhang, H Jin, L Hu, X Li, L Kang, M Luo, Y Song, H Wang
arXiv preprint arXiv:2412.03092, 2024
12024
CatchBackdoor: Backdoor Testing by Critical Trojan Neural Path Identification via Differential Fuzzing
H Jin, R Chen, J Chen, Y Cheng, C Fu, T Wang, Y Yu, Z Ming
arXiv preprint arXiv:2112.13064, 2021
12021
NIP: Neuron-level Inverse Perturbation Against Adversarial Attacks
R Chen, H Jin, J Chen, H Zheng, Y Yu, S Ji
arXiv preprint arXiv:2112.13060, 2021
12021
CatchBackdoor: Backdoor Detection via Critical Trojan Neural Path Fuzzing
H Jin, R Chen, J Chen, H Zheng, Y Zhang, H Wang
European Conference on Computer Vision, 90-106, 2024
2024
Fight Perturbations with Perturbations: Defending Adversarial Attacks via Neuron Influence
R Chen, H Jin, H Zheng, J Chen, Z Liu
IEEE Transactions on Dependable and Secure Computing, 2024
2024
面向深度学习模型的可靠性测试综述
陈若曦, 金海波, 陈晋音, 郑海斌, 李晓豪
信息安全学报 9 (1), 33-55, 2024
2024
AdvCheck: Characterizing adversarial examples via local gradient checking
R Chen, H Jin, J Chen, H Zheng, S Zheng, X Yang, X Yang
Computers & Security 136, 103540, 2024
2024
GuardVal: Dynamic Large Language Model Jailbreak Evaluation for Comprehensive Safety Testing
P Zhang, H Jin, L Kang, Y Song, H Wang
Supplementary Material of CatchBackdoor: Backdoor Detection via Critical Trojan Neural Path Fuzzing
H Jin, R Chen, J Chen, H Zheng, Y Zhang, H Wang
ECCV 2024, 0
현재 시스템이 작동되지 않습니다. 나중에 다시 시도해 주세요.
학술자료 1–18