Xiangyu QI

عدد مرات الاقتباسات

	الكل	قبل 2020
اقتباسات	1130	1130
h-index	12	12
i10-index	13	13

860

430

215

645

202220232024202532 153 852 87

عدد المنشورات المتاحة للجميع

عرض المجموعة جميعها

4 مقالات

0 مقالة

المقالات البحثية المتاحة للجميع

المقالات البحثية غير المتاحة للجميع

تمّ اختيار المعلومات استنادًا إلى تفويضات التمويل

المؤلفون المشاركون

Prateek MittalProfessor, Princeton Universityبريد إلكتروني تم التحقق منه على princeton.edu
Peter HendersonPrinceton Universityبريد إلكتروني تم التحقق منه على princeton.edu
Kaixuan HuangPrinceton Universityبريد إلكتروني تم التحقق منه على princeton.edu
Pin-Yu ChenPrincipal Research Scientist, IBM Research AI; MIT-IBM Watson AI Lab; RPI-IBM AIRCبريد إلكتروني تم التحقق منه على ibm.com
Mengdi WangCenter for Statistics & Machine Learning, ECE, Princeton Universityبريد إلكتروني تم التحقق منه على princeton.edu
Ruoxi JiaAssistant Professor, Virginia Techبريد إلكتروني تم التحقق منه على vt.edu
Saeed MahloujifarFAIR, Metaبريد إلكتروني تم التحقق منه على meta.com
Ashwinee PandaPostdoctoral Fellow, University of Marylandبريد إلكتروني تم التحقق منه على umd.edu
Bo LiUniversity of Illinois at Urbana–Champaignبريد إلكتروني تم التحقق منه على illinois.edu
Yangsibo HuangGoogleبريد إلكتروني تم التحقق منه على google.com
Chaowei XiaoUniversity of Wisconsin - Madison/NVIDIAبريد إلكتروني تم التحقق منه على umich.edu
Danqi ChenPrinceton Universityبريد إلكتروني تم التحقق منه على cs.princeton.edu
Nezihe Merve GürelDelft University of Technologyبريد إلكتروني تم التحقق منه على stanford.edu
Ce ZhangTogether AI; University of Chicagoبريد إلكتروني تم التحقق منه على together.xyz
Ahmad BeiramiGoogle DeepMindبريد إلكتروني تم التحقق منه على google.com
Xiao MaGoogleبريد إلكتروني تم التحقق منه على cornell.edu
Dawn SongProfessor of Computer Science, UC Berkeleyبريد إلكتروني تم التحقق منه على cs.berkeley.edu
Nicholas CarliniGoogle DeepMindبريد إلكتروني تم التحقق منه على google.com

متابعة

Xiangyu QI

Princeton University

بريد إلكتروني تم التحقق منه على princeton.edu - الصفحة الرئيسية


عنوان ترتيب حسب الاقتباسات ترتيب حسب السنة الترتيب حسب العنوان	عدد مرات الاقتباسات عدد مرات الاقتباسات	السنة
Fine-tuning aligned language models compromises safety, even when users do not intend to!‏ X Qi, Y Zeng, T Xie, PY Chen, R Jia, P Mittal, P Henderson‏ International Conference on Learning Representations (ICLR), 2024 (Oral), 2023‏	412	2023
Visual Adversarial Examples Jailbreak Aligned Large Language Models‏ X Qi, K Huang, A Panda, P Henderson, M Wang, P Mittal‏ AAAI Conference on Artificial Intelligence, 2024 (Oral), 2023‏	204*	2023
Revisiting the assumption of latent separability for backdoor defenses‏ X Qi, T Xie, Y Li, S Mahloujifar, P Mittal‏ International Conference on Learning Representations (ICLR), 2023, 2023‏	113*	2023
Assessing the brittleness of safety alignment via pruning and low-rank modifications‏ B Wei, K Huang, Y Huang, T Xie, X Qi, M Xia, P Mittal, M Wang, ...‏ International Conference on Machine Learning (ICML), 2024, 2024‏	69	2024
Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks‏ X Qi, T Xie, R Pan, J Zhu, Y Yang, K Bu‏ Conference on Computer Vision and Pattern Recognition (CVPR), 2022 (Oral), 2021‏	69	2021
Towards A Proactive {ML} Approach for Detecting Backdoor Poison Samples‏ X Qi, T Xie, JT Wang, T Wu, S Mahloujifar, P Mittal‏ 32nd USENIX Security Symposium (USENIX Security 23), 1685-1702, 2023‏	46	2023
Knowledge Enhanced Machine Learning Pipeline against Diverse Adversarial Attacks‏ NM Gürel, X Qi, L Rimanic, C Zhang, B Li‏ International Conference on Machine Learning (ICML), 2021, 2021‏	46	2021
Subnet Replacement: Deployment-stage backdoor attack against deep neural networks in gray-box setting‏ X Qi, J Zhu, C Xie, Y Yang‏ ICLR Workshop, 2021‏	36	2021
Mitigating fine-tuning jailbreak attack with backdoor enhanced alignment‏ J Wang, J Li, Y Li, X Qi, M Chen, J Hu, Y Li, B Li, C Xiao‏ Conference on Neural Information Processing Systems (NeurIPS), 2024, 2024‏	29	2024
Sorry-bench: Systematically evaluating large language model safety refusal behaviors‏ T Xie, X Qi, Y Zeng, Y Huang, UM Sehwag, K Huang, L He, B Wei, ...‏ International Conference on Learning Representations (ICLR), 2025, 2024‏	28	2024
Safety Alignment Should Be Made More Than Just a Few Tokens Deep‏ X Qi, A Panda, K Lyu, X Ma, S Roy, A Beirami, P Mittal, P Henderson‏ International Conference on Learning Representations (ICLR), 2025, 2024‏	28	2024
AI Risk Management Should Incorporate Both Safety and Security‏ X Qi, Y Huang, Y Zeng, E Debenedetti, J Geiping, L He, K Huang, ...‏ arXiv preprint arXiv:2405.19524, 2024‏	12	2024
Uncovering Adversarial Risks of Test-Time Adaptation‏ T Wu, F Jia, X Qi, JT Wang, V Sehwag, S Mahloujifar, P Mittal‏ International Conference on Machine Learning (ICML), 2023, 2023‏	11	2023
Lottery ticket adaptation: Mitigating destructive interference in llms‏ A Panda, B Isik, X Qi, S Koyejo, T Weissman, P Mittal‏ arXiv preprint arXiv:2406.16797, 2024‏	9	2024
Defensive prompt patch: A robust and interpretable defense of llms against jailbreak attacks‏ C Xiong, X Qi, PY Chen, TY Ho‏ arXiv preprint arXiv:2405.20099, 2024‏	8	2024
BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection‏ T Xie, X Qi, P He, Y Li, JT Wang, P Mittal‏ International Conference on Learning Representations (ICLR), 2024, 2023‏	5	2023
On evaluating the durability of safeguards for open-weight llms‏ X Qi, B Wei, N Carlini, Y Huang, T Xie, L He, M Jagielski, M Nasr, P Mittal, ...‏ International Conference on Learning Representations (ICLR), 2025, 2024‏	4	2024
Libra-leaderboard: Towards responsible ai through a balanced leaderboard of safety and capability‏ H Li, X Han, Z Zhai, H Mu, H Wang, Z Zhang, Y Geng, S Lin, R Wang, ...‏ arXiv preprint arXiv:2412.18551, 2024‏	1	2024
Cascaded to End-to-End: New Safety, Security, and Evaluation Questions for Audio Language Models‏ L He, X Qi, I Cheong, PMD Chen, P Henderson‏

يتعذر على النظام إجراء العملية في الوقت الحالي. عاود المحاولة لاحقًا.

مقالات 1–19

عدد الاقتباسات في العام

اقتباسات مكررة

الاقتباسات المدمجة

إضافة مؤلفين مشاركينالمؤلفون المشاركون

متابعة

عدد مرات الاقتباسات

المؤلفون المشاركون