Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Jailbreak and guard aligned language models with only few in-context demonstrations
Large Language Models (LLMs) have shown remarkable success in various tasks, yet their
safety and the risk of generating harmful content remain pressing concerns. In this paper, we …
safety and the risk of generating harmful content remain pressing concerns. In this paper, we …
Jailbreaking large language models against moderation guardrails via cipher characters
Abstract Large Language Models (LLMs) are typically harmless but remain vulnerable to
carefully crafted prompts known as``jailbreaks'', which can bypass protective measures and …
carefully crafted prompts known as``jailbreaks'', which can bypass protective measures and …
Jailbreakzoo: Survey, landscapes, and horizons in jailbreaking large language and vision-language models
The rapid evolution of artificial intelligence (AI) through developments in Large Language
Models (LLMs) and Vision-Language Models (VLMs) has brought significant advancements …
Models (LLMs) and Vision-Language Models (VLMs) has brought significant advancements …
Psysafe: A comprehensive framework for psychological-based attack, defense, and evaluation of multi-agent system safety
Multi-agent systems, when enhanced with Large Language Models (LLMs), exhibit profound
capabilities in collective intelligence. However, the potential misuse of this intelligence for …
capabilities in collective intelligence. However, the potential misuse of this intelligence for …
Towards tracing trustworthiness dynamics: Revisiting pre-training period of large language models
Ensuring the trustworthiness of large language models (LLMs) is crucial. Most studies
concentrate on fully pre-trained LLMs to better understand and improve LLMs' …
concentrate on fully pre-trained LLMs to better understand and improve LLMs' …
Identifying semantic induction heads to understand in-context learning
Although large language models (LLMs) have demonstrated remarkable performance, the
lack of transparency in their inference logic raises concerns about their trustworthiness. To …
lack of transparency in their inference logic raises concerns about their trustworthiness. To …
Adversarial tuning: Defending against jailbreak attacks for llms
Although safely enhanced Large Language Models (LLMs) have achieved remarkable
success in tackling various complex tasks in a zero-shot manner, they remain susceptible to …
success in tackling various complex tasks in a zero-shot manner, they remain susceptible to …
Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues
This study exposes the safety vulnerabilities of Large Language Models (LLMs) in multi-turn
interactions, where malicious users can obscure harmful intents across several queries. We …
interactions, where malicious users can obscure harmful intents across several queries. We …
Contextual api completion for unseen repositories using llms
Large language models have made substantial progress in addressing diverse code-related
tasks. However, their adoption is hindered by inconsistencies in generating output due to the …
tasks. However, their adoption is hindered by inconsistencies in generating output due to the …
Vlsbench: Unveiling visual leakage in multimodal safety
Safety concerns of Multimodal large language models (MLLMs) have gradually become an
important problem in various applications. Surprisingly, previous works indicate a counter …
important problem in various applications. Surprisingly, previous works indicate a counter …