متابعة
Xiangyu QI
Xiangyu QI
بريد إلكتروني تم التحقق منه على princeton.edu - الصفحة الرئيسية
عنوان
عدد مرات الاقتباسات
عدد مرات الاقتباسات
السنة
Fine-tuning aligned language models compromises safety, even when users do not intend to!
X Qi, Y Zeng, T Xie, PY Chen, R Jia, P Mittal, P Henderson
International Conference on Learning Representations (ICLR), 2024 (Oral), 2023
4122023
Visual Adversarial Examples Jailbreak Aligned Large Language Models
X Qi, K Huang, A Panda, P Henderson, M Wang, P Mittal
AAAI Conference on Artificial Intelligence, 2024 (Oral), 2023
204*2023
Revisiting the assumption of latent separability for backdoor defenses
X Qi, T Xie, Y Li, S Mahloujifar, P Mittal
International Conference on Learning Representations (ICLR), 2023, 2023
113*2023
Assessing the brittleness of safety alignment via pruning and low-rank modifications
B Wei, K Huang, Y Huang, T Xie, X Qi, M Xia, P Mittal, M Wang, ...
International Conference on Machine Learning (ICML), 2024, 2024
692024
Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks
X Qi, T Xie, R Pan, J Zhu, Y Yang, K Bu
Conference on Computer Vision and Pattern Recognition (CVPR), 2022 (Oral), 2021
692021
Towards A Proactive {ML} Approach for Detecting Backdoor Poison Samples
X Qi, T Xie, JT Wang, T Wu, S Mahloujifar, P Mittal
32nd USENIX Security Symposium (USENIX Security 23), 1685-1702, 2023
462023
Knowledge Enhanced Machine Learning Pipeline against Diverse Adversarial Attacks
NM Gürel*, X Qi*, L Rimanic, C Zhang, B Li
International Conference on Machine Learning (ICML), 2021, 2021
462021
Subnet Replacement: Deployment-stage backdoor attack against deep neural networks in gray-box setting
X Qi, J Zhu, C Xie, Y Yang
ICLR Workshop, 2021
362021
Mitigating fine-tuning jailbreak attack with backdoor enhanced alignment
J Wang, J Li, Y Li, X Qi, M Chen, J Hu, Y Li, B Li, C Xiao
Conference on Neural Information Processing Systems (NeurIPS), 2024, 2024
292024
Sorry-bench: Systematically evaluating large language model safety refusal behaviors
T Xie*, X Qi*, Y Zeng*, Y Huang*, UM Sehwag, K Huang, L He, B Wei, ...
International Conference on Learning Representations (ICLR), 2025, 2024
282024
Safety Alignment Should Be Made More Than Just a Few Tokens Deep
X Qi, A Panda, K Lyu, X Ma, S Roy, A Beirami, P Mittal, P Henderson
International Conference on Learning Representations (ICLR), 2025, 2024
282024
AI Risk Management Should Incorporate Both Safety and Security
X Qi, Y Huang, Y Zeng, E Debenedetti, J Geiping, L He, K Huang, ...
arXiv preprint arXiv:2405.19524, 2024
122024
Uncovering Adversarial Risks of Test-Time Adaptation
T Wu, F Jia, X Qi, JT Wang, V Sehwag, S Mahloujifar, P Mittal
International Conference on Machine Learning (ICML), 2023, 2023
112023
Lottery ticket adaptation: Mitigating destructive interference in llms
A Panda, B Isik, X Qi, S Koyejo, T Weissman, P Mittal
arXiv preprint arXiv:2406.16797, 2024
92024
Defensive prompt patch: A robust and interpretable defense of llms against jailbreak attacks
C Xiong, X Qi, PY Chen, TY Ho
arXiv preprint arXiv:2405.20099, 2024
82024
BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection
T Xie, X Qi, P He, Y Li, JT Wang, P Mittal
International Conference on Learning Representations (ICLR), 2024, 2023
52023
On evaluating the durability of safeguards for open-weight llms
X Qi, B Wei, N Carlini, Y Huang, T Xie, L He, M Jagielski, M Nasr, P Mittal, ...
International Conference on Learning Representations (ICLR), 2025, 2024
42024
Libra-leaderboard: Towards responsible ai through a balanced leaderboard of safety and capability
H Li, X Han, Z Zhai, H Mu, H Wang, Z Zhang, Y Geng, S Lin, R Wang, ...
arXiv preprint arXiv:2412.18551, 2024
12024
Cascaded to End-to-End: New Safety, Security, and Evaluation Questions for Audio Language Models
L He, X Qi, I Cheong, PMD Chen, P Henderson
يتعذر على النظام إجراء العملية في الوقت الحالي. عاود المحاولة لاحقًا.
مقالات 1–19