Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Harmful fine-tuning attacks and defenses for large language models: A survey
Recent research demonstrates that the nascent fine-tuning-as-a-service business model
exposes serious safety concerns--fine-tuning over a few harmful data uploaded by the users …
exposes serious safety concerns--fine-tuning over a few harmful data uploaded by the users …
Mitigating backdoor threats to large language models: Advancement and challenges
The advancement of Large Language Models (LLMs) has significantly impacted various
domains, including Web search, healthcare, and software development. However, as these …
domains, including Web search, healthcare, and software development. However, as these …
Denial-of-service poisoning attacks against large language models
Recent studies have shown that LLMs are vulnerable to denial-of-service (DoS) attacks,
where adversarial inputs like spelling errors or non-semantic prompts trigger endless …
where adversarial inputs like spelling errors or non-semantic prompts trigger endless …
Navigating the risks: A survey of security, privacy, and ethics threats in llm-based agents
With the continuous development of large language models (LLMs), transformer-based
models have made groundbreaking advances in numerous natural language processing …
models have made groundbreaking advances in numerous natural language processing …
Safety at Scale: A Comprehensive Survey of Large Model Safety
The rapid advancement of large models, driven by their exceptional abilities in learning and
generalization through large-scale pre-training, has reshaped the landscape of Artificial …
generalization through large-scale pre-training, has reshaped the landscape of Artificial …
When Backdoors Speak: Understanding LLM Backdoor Attacks Through Model-Generated Explanations
Large Language Models (LLMs) are vulnerable to backdoor attacks, where hidden triggers
can maliciously manipulate model behavior. While several backdoor attack methods have …
can maliciously manipulate model behavior. While several backdoor attack methods have …
Concept-ROT: Poisoning Concepts in Large Language Models with Model Editing
Model editing methods modify specific behaviors of Large Language Models by altering a
small, targeted set of network weights and require very little data and compute. These …
small, targeted set of network weights and require very little data and compute. These …
Trading Devil RL: Backdoor attack via Stock market, Bayesian Optimization and Reinforcement Learning
O Mengara - arxiv preprint arxiv:2412.17908, 2024 - arxiv.org
With the rapid development of generative artificial intelligence, particularly large language
models, a number of sub-fields of deep learning have made significant progress and are …
models, a number of sub-fields of deep learning have made significant progress and are …
On the Validity of Traditional Vulnerability Scoring Systems for Adversarial Attacks against LLMs
This research investigates the effectiveness of established vulnerability metrics, such as the
Common Vulnerability Scoring System (CVSS), in evaluating attacks against Large …
Common Vulnerability Scoring System (CVSS), in evaluating attacks against Large …
The TIP of the Iceberg: Revealing a Hidden Class of Task-In-Prompt Adversarial Attacks on LLMs
We present a novel class of jailbreak adversarial attacks on LLMs, termed Task-in-Prompt
(TIP) attacks. Our approach embeds sequence-to-sequence tasks (eg, cipher decoding …
(TIP) attacks. Our approach embeds sequence-to-sequence tasks (eg, cipher decoding …