A survey of backdoor attacks and defenses on large language models: Implications for security measures

S Zhao, M Jia, Z Guo, L Gan, X Xu, X Wu, J Fu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs), which bridge the gap between human language
understanding and complex problem-solving, achieve state-of-the-art performance on …

[PDF][PDF] Universal vulnerabilities in large language models: Backdoor attacks for in-context learning

S Zhao, M Jia, LA Tuan, F Pan… - arxiv preprint arxiv …, 2024 - researchgate.net
In-context learning, a paradigm bridging the gap between pre-training and fine-tuning, has
demonstrated high efficacy in several NLP tasks, especially in few-shot settings. Despite …

A White-Box False Positive Adversarial Attack Method on Contrastive Loss Based Offline Handwritten Signature Verification Models

Z Guo, W Li, Y Qian, O Arandjelovic… - International …, 2024 - proceedings.mlr.press
In this paper, we tackle the challenge of white-box false positive adversarial attacks on
contrastive loss based offline handwritten signature verification models. We propose a novel …

That Doesn't Go There: Attacks on Shared State in {Multi-User} Augmented Reality Applications

C Slocum, Y Zhang, E Shayegani, P Zaree… - 33rd USENIX Security …, 2024 - usenix.org
Augmented Reality (AR) can enable shared virtual experiences between multiple users. In
order to do so, it is crucial for multi-user AR applications to establish a consensus on the" …

A grey-box attack against latent diffusion model-based image editing by posterior collapse

Z Guo, L Fang, J Lin, Y Qian, S Zhao, Z Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advancements in generative AI, particularly Latent Diffusion Models (LDMs), have
revolutionized image synthesis and manipulation. However, these generative techniques …

Weak-to-Strong Backdoor Attack for Large Language Models

S Zhao, L Gan, Z Guo, X Wu, L **ao, X Xu… - arxiv preprint arxiv …, 2024 - arxiv.org
Despite being widely applied due to their exceptional capabilities, Large Language Models
(LLMs) have been proven to be vulnerable to backdoor attacks. These attacks introduce …

Instant Adversarial Purification with Adversarial Consistency Distillation

CT Lei, HM Yam, Z Guo, CP Lau - arxiv preprint arxiv:2408.17064, 2024 - arxiv.org
Neural networks, despite their remarkable performance in widespread applications,
including image classification, are also known to be vulnerable to subtle adversarial noise …

Machine Learning Algorithms for Fostering Innovative Education for University Students

Y Wang, F You, Q Li - Electronics, 2024 - mdpi.com
Data augmentation with mixup has been proven effective in various machine learning tasks.
However, previous methods primarily concentrate on generating previously unseen virtual …

StyleMark: A Robust Watermarking Method for Art Style Images Against Black-Box Arbitrary Style Transfer

Y Zhang, D Ye, S Shen, J Wang - arxiv preprint arxiv:2412.07129, 2024 - arxiv.org
Arbitrary Style Transfer (AST) achieves the rendering of real natural images into the painting
styles of arbitrary art style images, promoting art communication. However, misuse of …

A Survey of Recent Backdoor Attacks and Defenses in Large Language Models

S Zhao, M Jia, Z Guo, L Gan, X XU, X Wu, J Fu… - … on Machine Learning … - openreview.net
Large Language Models (LLMs), which bridge the gap between human language
understanding and complex problem-solving, achieve state-of-the-art performance on …