Against The Achilles' Heel: A Survey on Red Teaming for Generative Models

L Lin, H Mu, Z Zhai, M Wang, Y Wang, R Wang… - Journal of Artificial …, 2025 - jair.org
Generative models are rapidly gaining popularity and being integrated into everyday
applications, raising concerns over their safe use as various vulnerabilities are exposed. In …

A survey of backdoor attacks and defenses on large language models: Implications for security measures

S Zhao, M Jia, Z Guo, L Gan, X Xu, X Wu, J Fu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs), which bridge the gap between human language
understanding and complex problem-solving, achieve state-of-the-art performance on …

Enhancing federated semi-supervised learning with out-of-distribution filtering amidst class mismatches

J **, F Ni, S Dai, K Li, B Hong - Journal of Computer Technology …, 2024 - suaspress.org
Federated Learning (FL) has gained prominence as a method for training models on edge
computing devices, enabling the preservation of data privacy by eliminating the need to …

Artwork protection against neural style transfer using locally adaptive adversarial color attack

Z Guo, J Dong, Y Qian, K Wang, W Li, Z Guo… - ECAI 2024, 2024 - ebooks.iospress.nl
Neural style transfer (NST) generates new images by combining the style of one image with
the content of another. However, unauthorized NST can exploit artwork, raising concerns …

Clean-label backdoor attack and defense: An examination of language model vulnerability

S Zhao, X Xu, L **ao, J Wen, LA Tuan - Expert Systems with Applications, 2025 - Elsevier
Prompt-based learning, a paradigm that creates a bridge between pre-training and fine-
tuning stages, has proven to be highly effective concerning various NLP tasks, particularly in …

Mitigating backdoor threats to large language models: Advancement and challenges

Q Liu, W Mo, T Tong, J Xu, F Wang… - 2024 60th Annual …, 2024 - ieeexplore.ieee.org
The advancement of Large Language Models (LLMs) has significantly impacted various
domains, including Web search, healthcare, and software development. However, as these …

[PDF][PDF] A comprehensive evaluation and comparison of enhanced learning methods

J Song, H Liu, K Li, J Tian, Y Mo - Academic Journal of Science and …, 2024 - drpress.org
This paper provides a comprehensive evaluation and comparison of current reinforcement
learning methods. By analyzing the strengths and weaknesses of the main methods, such as …

Large Model Agents: State-of-the-Art, Cooperation Paradigms, Security and Privacy, and Future Trends

Y Wang, Y Pan, Q Zhao, Y Deng, Z Su, L Du… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Model (LM) agents, powered by large foundation models such as GPT-4 and DALL-E
2, represent a significant step towards achieving Artificial General Intelligence (AGI). LM …

Weak-to-Strong Backdoor Attack for Large Language Models

S Zhao, L Gan, Z Guo, X Wu, L **ao, X Xu… - arxiv preprint arxiv …, 2024 - arxiv.org
Despite being widely applied due to their exceptional capabilities, Large Language Models
(LLMs) have been proven to be vulnerable to backdoor attacks. These attacks introduce …

Unlearning backdoor attacks for llms with weak-to-strong knowledge distillation

S Zhao, X Wu, CD Nguyen, M Jia, Y Feng… - arxiv preprint arxiv …, 2024 - arxiv.org
Parameter-efficient fine-tuning (PEFT) can bridge the gap between large language models
(LLMs) and downstream tasks. However, PEFT has been proven vulnerable to malicious …