- Academic Search

A Liu, X Liu, X Zhang, Y **ao, Y Zhou, S Liang… - International Journal of …, 2025 - Springer

Pre-trained vision models (PVMs) have become a dominant component due to their
exceptional performance when fine-tuned for downstream tasks. However, the presence of …

Enregistrer Citer Cité 20 fois Autres articles Les 2 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

A survey of backdoor attacks and defenses on large language models: Implications for security measures

S Zhao, M Jia, Z Guo, L Gan, X Xu, X Wu, J Fu… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Language Models (LLMs), which bridge the gap between human language
understanding and complex problem-solving, achieve state-of-the-art performance on …

Enregistrer Citer Cité 10 fois Autres articles Les 2 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] aclanthology.org

RLHFPoison: Reward poisoning attack for reinforcement learning with human feedback in large language models

J Wang, J Wu, M Chen, Y Vorobeychik… - Proceedings of the …, 2024 - aclanthology.org

Abstract Reinforcement Learning with Human Feedback (RLHF) is a methodology designed
to align Large Language Models (LLMs) with human preferences, playing an important role …

Enregistrer Citer Cité 3 fois Autres articles Les 3 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Weak-to-Strong Backdoor Attack for Large Language Models

S Zhao, L Gan, Z Guo, X Wu, L **ao, X Xu… - arxiv preprint arxiv …, 2024 - arxiv.org

Despite being widely applied due to their exceptional capabilities, Large Language Models
(LLMs) have been proven to be vulnerable to backdoor attacks. These attacks introduce …

Enregistrer Citer Cité 1 fois Autres articles Les 3 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

TrojanRAG: Retrieval-Augmented Generation Can Be Backdoor Driver in Large Language Models

P Cheng, Y Ding, T Ju, Z Wu, W Du, P Yi… - arxiv preprint arxiv …, 2024 - arxiv.org

Large language models (LLMs) have raised concerns about potential security threats
despite performing significantly in Natural Language Processing (NLP). Backdoor attacks …

Enregistrer Citer Cité 22 fois Autres articles Les 2 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

New Emerged Security and Privacy of Pre-trained Model: a Survey and Outlook

M Yang, T Zhu, C Liu, WL Zhou, S Yu, PS Yu - arxiv preprint arxiv …, 2024 - arxiv.org

Thanks to the explosive growth of data and the development of computational resources, it is
possible to build pre-trained models that can achieve outstanding performance on various …

Enregistrer Citer Autres articles Les 2 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] openreview.net

A Survey of Recent Backdoor Attacks and Defenses in Large Language Models

S Zhao, M Jia, Z Guo, L Gan, X XU, X Wu, J Fu… - … on Machine Learning … - openreview.net

Large Language Models (LLMs), which bridge the gap between human language
understanding and complex problem-solving, achieve state-of-the-art performance on …

Enregistrer Citer Autres articles Version HTML

Créer l'alerte

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

Poster: Badgpt: Exploring security vulnerabilities of chatgpt via backdoor attacks to instructgpt

Pre-trained trojan attacks for visual recognition

A survey of backdoor attacks and defenses on large language models: Implications for security measures

RLHFPoison: Reward poisoning attack for reinforcement learning with human feedback in large language models

Weak-to-Strong Backdoor Attack for Large Language Models

TrojanRAG: Retrieval-Augmented Generation Can Be Backdoor Driver in Large Language Models

New Emerged Security and Privacy of Pre-trained Model: a Survey and Outlook

A Survey of Recent Backdoor Attacks and Defenses in Large Language Models