Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Cold-attack: Jailbreaking llms with stealthiness and controllability
Jailbreaks on large language models (LLMs) have recently received increasing attention.
For a comprehensive assessment of LLM safety, it is essential to consider jailbreaks with …
For a comprehensive assessment of LLM safety, it is essential to consider jailbreaks with …
Prompting4debugging: Red-teaming text-to-image diffusion models by finding problematic prompts
Text-to-image diffusion models, eg Stable Diffusion (SD), lately have shown remarkable
ability in high-quality content generation, and become one of the representatives for the …
ability in high-quality content generation, and become one of the representatives for the …
Against The Achilles' Heel: A Survey on Red Teaming for Generative Models
Generative models are rapidly gaining popularity and being integrated into everyday
applications, raising concerns over their safe use as various vulnerabilities are exposed. In …
applications, raising concerns over their safe use as various vulnerabilities are exposed. In …
Flirt: Feedback loop in-context red teaming
Warning: this paper contains content that may be inappropriate or offensive. As generative
models become available for public use in various applications, testing and analyzing …
models become available for public use in various applications, testing and analyzing …
Trustworthy, responsible, and safe ai: A comprehensive architectural framework for ai safety with challenges and mitigations
AI Safety is an emerging area of critical importance to the safe adoption and deployment of
AI systems. With the rapid proliferation of AI and especially with the recent advancement of …
AI systems. With the rapid proliferation of AI and especially with the recent advancement of …
Exploring safety-utility trade-offs in personalized language models
As large language models (LLMs) become increasingly integrated into daily applications, it
is essential to ensure they operate fairly across diverse user demographics. In this work, we …
is essential to ensure they operate fairly across diverse user demographics. In this work, we …
Diverse and Effective Red Teaming with Auto-generated Rewards and Multi-step Reinforcement Learning
Automated red teaming can discover rare model failures and generate challenging
examples that can be used for training or evaluation. However, a core challenge in …
examples that can be used for training or evaluation. However, a core challenge in …
Asetf: A novel method for jailbreak attack on llms through translate suffix embeddings
The safety defense methods of Large language models (LLMs) stays limited because the
dangerous prompts are manually curated to just few known attack types, which fails to keep …
dangerous prompts are manually curated to just few known attack types, which fails to keep …
Impact of non-standard unicode characters on security and comprehension in large language models
The advancement of large language models has significantly improved natural language
processing. However, challenges such as jailbreaks (prompt injections that cause an LLM to …
processing. However, challenges such as jailbreaks (prompt injections that cause an LLM to …
Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity Constraints
Recent work has proposed automated red-teaming methods for testing the vulnerabilities of
a given target large language model (LLM). These methods use red-teaming LLMs to …
a given target large language model (LLM). These methods use red-teaming LLMs to …