Följ
Yige Li
Yige Li
Verifierad e-postadress på smu.edu.sg
Titel
Citeras av
Citeras av
År
Neural attention distillation: Erasing backdoor triggers from deep neural networks
Y Li, X Lyu, N Koren, L Lyu, B Li, X Ma
ICLR 2021, 2021
5052021
Anti-backdoor learning: Training clean models on poisoned data
Y Li, X Lyu, N Koren, L Lyu, B Li, X Ma
NeurIPS 2021, 2021
3502021
Reconstructive Neuron Pruning for Backdoor Defense
Y Li, X Lyu, X Ma, N Koren, L Lyu, B Li, YG Jiang
ICML 2023, 2023
542023
Defending Large Language Models Against Jailbreak Attacks via Layer-specific Editing
W Zhao, Z Li, Y Li, Y Zhang, J Sun
EMNLP 2024, 2024
162024
Backdoorllm: A comprehensive benchmark for backdoor attacks on large language models
Y Li, H Huang, Y Zhao, X Ma, J Sun
arXiv preprint arXiv:2408.12798, 2024
102024
Shortcuts Everywhere and Nowhere: Exploring Multi-Trigger Backdoor Attacks
Y Li, J He, H Huang, J Sun, X Ma, YG Jiang
arXiv preprint arXiv:2401.15295, 2024
10*2024
Anyattack: Towards large-scale self-supervised generation of targeted adversarial examples for vision-language models
J Zhang, J Ye, X Ma, Y Li, Y Yang, J Sang, DY Yeung
arXiv preprint arXiv:2410.05346, 2024
52024
Do Influence Functions Work on Large Language Models?
Z Li, W Zhao, Y Li, J Sun
arXiv preprint arXiv:2409.19998, 2024
32024
BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks
Y Zhao, X Zheng, L Luo, Y Li, X Ma, YG Jiang
ICLR 2025, 2024
22024
Expose before you defend: Unifying and enhancing backdoor defenses via exposed models
Y Li, H Huang, J Zhang, X Ma, YG Jiang
arXiv preprint arXiv:2410.19427, 2024
22024
Detecting Backdoor Samples in Contrastive Language Image Pretraining
H Huang, S Erfani, Y Li, X Ma, J Bailey
ICLR 2025, 2025
12025
CROW: Eliminating Backdoors from Large Language Models via Internal Consistency Regularization
NM Min, LH Pham, Y Li, J Sun
arXiv preprint arXiv:2411.12768, 2024
12024
End-to-End Anti-Backdoor Learning on Images and Time Series
Y Jiang, X Ma, SM Erfani, Y Li, J Bailey
arXiv preprint arXiv:2401.03215, 2024
12024
Safety at Scale: A Comprehensive Survey of Large Model Safety
X Ma, Y Gao, Y Wang, R Wang, X Wang, Y Sun, Y Ding, H Xu, Y Chen, ...
arXiv preprint arXiv:2502.05206, 2025
2025
Backdoor Token Unlearning: Exposing and Defending Backdoors in Pretrained Language Models
P Jiang, X Lyu, Y Li, J Ma
AAAI 2025, 2025
2025
Adversarial Suffixes May Be Features Too!
W Zhao, Z Li, Y Li, J Sun
arXiv preprint arXiv:2410.00451, 2024
2024
Systemet kan inte utföra åtgärden just nu. Försök igen senare.
Artiklar 1–16