- Academic Search

T Huang, S Hu, F Ilhan, SF Tekin, L Liu - arxiv preprint arxiv:2409.18169, 2024 - arxiv.org

Recent research demonstrates that the nascent fine-tuning-as-a-service business model
exposes serious safety concerns--fine-tuning over a few harmful data uploaded by the users …

Spara Citera Citerat av 13 Relaterade artiklar Alla 5 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Mitigating backdoor threats to large language models: Advancement and challenges

Q Liu, W Mo, T Tong, J Xu, F Wang… - 2024 60th Annual …, 2024 - ieeexplore.ieee.org

The advancement of Large Language Models (LLMs) has significantly impacted various
domains, including Web search, healthcare, and software development. However, as these …

Spara Citera Citerat av 1 Relaterade artiklar Alla 3 versionerna

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Denial-of-service poisoning attacks against large language models

K Gao, T Pang, C Du, Y Yang, ST **a, M Lin - arxiv preprint arxiv …, 2024 - arxiv.org

Recent studies have shown that LLMs are vulnerable to denial-of-service (DoS) attacks,
where adversarial inputs like spelling errors or non-semantic prompts trigger endless …

Spara Citera Citerat av 2 Relaterade artiklar Alla 3 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Navigating the risks: A survey of security, privacy, and ethics threats in llm-based agents

Y Gan, Y Yang, Z Ma, P He, R Zeng, Y Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

With the continuous development of large language models (LLMs), transformer-based
models have made groundbreaking advances in numerous natural language processing …

Spara Citera Citerat av 2 Relaterade artiklar Alla 2 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Safety at Scale: A Comprehensive Survey of Large Model Safety

X Ma, Y Gao, Y Wang, R Wang, X Wang, Y Sun… - arxiv preprint arxiv …, 2025 - arxiv.org

The rapid advancement of large models, driven by their exceptional abilities in learning and
generalization through large-scale pre-training, has reshaped the landscape of Artificial …

Spara Citera Relaterade artiklar Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

When Backdoors Speak: Understanding LLM Backdoor Attacks Through Model-Generated Explanations

H Ge, Y Li, Q Wang, Y Zhang, R Tang - arxiv preprint arxiv:2411.12701, 2024 - arxiv.org

Large Language Models (LLMs) are vulnerable to backdoor attacks, where hidden triggers
can maliciously manipulate model behavior. While several backdoor attack methods have …

Spara Citera Relaterade artiklar Alla 3 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Concept-ROT: Poisoning Concepts in Large Language Models with Model Editing

K Grimes, M Christiani, D Shriver, M Connor - arxiv preprint arxiv …, 2024 - arxiv.org

Model editing methods modify specific behaviors of Large Language Models by altering a
small, targeted set of network weights and require very little data and compute. These …

Spara Citera Relaterade artiklar Alla 2 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Trading Devil RL: Backdoor attack via Stock market, Bayesian Optimization and Reinforcement Learning

O Mengara - arxiv preprint arxiv:2412.17908, 2024 - arxiv.org

With the rapid development of generative artificial intelligence, particularly large language
models, a number of sub-fields of deep learning have made significant progress and are …

Spara Citera Relaterade artiklar Alla 2 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

On the Validity of Traditional Vulnerability Scoring Systems for Adversarial Attacks against LLMs

AAM Bahar, AS Wazan - arxiv preprint arxiv:2412.20087, 2024 - arxiv.org

This research investigates the effectiveness of established vulnerability metrics, such as the
Common Vulnerability Scoring System (CVSS), in evaluating attacks against Large …

Spara Citera Relaterade artiklar Alla 2 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The TIP of the Iceberg: Revealing a Hidden Class of Task-In-Prompt Adversarial Attacks on LLMs

S Berezin, R Farahbakhsh, N Crespi - arxiv preprint arxiv:2501.18626, 2025 - arxiv.org

We present a novel class of jailbreak adversarial attacks on LLMs, termed Task-in-Prompt
(TIP) attacks. Our approach embeds sequence-to-sequence tasks (eg, cipher decoding …

Spara Citera Relaterade artiklar Alla 2 versionerna Se som HTML-version

Skapa alarm

Citera

Avancerad sökning

Har sparats i Mitt bibliotek

Backdoorllm: A comprehensive benchmark for backdoor attacks on large language models

Harmful fine-tuning attacks and defenses for large language models: A survey

Mitigating backdoor threats to large language models: Advancement and challenges

Denial-of-service poisoning attacks against large language models

Navigating the risks: A survey of security, privacy, and ethics threats in llm-based agents

Safety at Scale: A Comprehensive Survey of Large Model Safety

When Backdoors Speak: Understanding LLM Backdoor Attacks Through Model-Generated Explanations

Concept-ROT: Poisoning Concepts in Large Language Models with Model Editing

Trading Devil RL: Backdoor attack via Stock market, Bayesian Optimization and Reinforcement Learning

On the Validity of Traditional Vulnerability Scoring Systems for Adversarial Attacks against LLMs

The TIP of the Iceberg: Revealing a Hidden Class of Task-In-Prompt Adversarial Attacks on LLMs