Badclip: Dual-embedding guided backdoor attack on multimodal contrastive learning

S Liang, M Zhu, A Liu, B Wu, X Cao… - Proceedings of the …, 2024 - openaccess.thecvf.com
While existing backdoor attacks have successfully infected multimodal contrastive learning
models such as CLIP they can be easily countered by specialized backdoor defenses for …

Cleanclip: Mitigating data poisoning attacks in multimodal contrastive learning

H Bansal, N Singhi, Y Yang, F Yin… - Proceedings of the …, 2023 - openaccess.thecvf.com
Multimodal contrastive pretraining has been used to train multimodal representation models,
such as CLIP, on large amounts of paired image-text data. However, previous studies have …

Distribution preserving backdoor attack in self-supervised learning

G Tao, Z Wang, S Feng, G Shen, S Ma… - 2024 IEEE Symposium …, 2024 - ieeexplore.ieee.org
Self-supervised learning is widely used in various domains for building foundation models. It
has been demonstrated to achieve state-of-the-art performance in a range of tasks. In the …

Towards reliable and efficient backdoor trigger inversion via decoupling benign features

X Xu, K Huang, Y Li, Z Qin, K Ren - The Twelfth International …, 2024 - openreview.net
Recent studies revealed that using third-party models may lead to backdoor threats, where
adversaries can maliciously manipulate model predictions based on backdoors implanted …

Django: Detecting trojans in object detection models via gaussian focus calibration

G Shen, S Cheng, G Tao, K Zhang… - Advances in …, 2024 - proceedings.neurips.cc
Object detection models are vulnerable to backdoor or trojan attacks, where an attacker can
inject malicious triggers into the model, leading to altered behavior during inference. As a …

Ssl-cleanse: Trojan detection and mitigation in self-supervised learning

M Zheng, J Xue, Z Wang, X Chen, Q Lou… - … on Computer Vision, 2024 - Springer
Self-supervised learning (SSL) is a prevalent approach for encoding data representations.
Using a pre-trained SSL image encoder and subsequently training a downstream classifier …

Defenses in adversarial machine learning: A survey

B Wu, S Wei, M Zhu, M Zheng, Z Zhu, M Zhang… - arxiv preprint arxiv …, 2023 - arxiv.org
Adversarial phenomenon has been widely observed in machine learning (ML) systems,
especially in those using deep neural networks, describing that ML systems may produce …

Lotus: Evasive and resilient backdoor attacks through sub-partitioning

S Cheng, G Tao, Y Liu, G Shen, S An… - Proceedings of the …, 2024 - openaccess.thecvf.com
Backdoor attack poses a significant security threat to Deep Learning applications. Existing
attacks are often not evasive to established backdoor detection techniques. This …

Open problems in machine unlearning for ai safety

F Barez, T Fu, A Prabhu, S Casper, A Sanyal… - arxiv preprint arxiv …, 2025 - arxiv.org
As AI systems become more capable, widely deployed, and increasingly autonomous in
critical areas such as cybersecurity, biological research, and healthcare, ensuring their …

Trustworthy, responsible, and safe ai: A comprehensive architectural framework for ai safety with challenges and mitigations

C Chen, Z Liu, W Jiang, SQ Goh, KKY Lam - arxiv preprint arxiv …, 2024 - arxiv.org
AI Safety is an emerging area of critical importance to the safe adoption and deployment of
AI systems. With the rapid proliferation of AI and especially with the recent advancement of …