Selfdefend: Llms can defend themselves against jailbreaking in a practical manner

X Wang, D Wu, Z Ji, Z Li, P Ma, S Wang, Y Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Jailbreaking is an emerging adversarial attack that bypasses the safety alignment deployed
in off-the-shelf large language models (LLMs) and has evolved into multiple categories …

Autopt: How far are we from the end2end automated web penetration testing?

B Wu, G Chen, K Chen, X Shang, J Han, Y He… - arxiv preprint arxiv …, 2024 - arxiv.org
Penetration testing is essential to ensure Web security, which can detect and fix
vulnerabilities in advance, and prevent data leakage and serious consequences. The …

Decomposition, Synthesis and Attack: A Multi-Instruction Fusion Method for Jailbreaking LLMs

S Jiang, X Chen, K Xu, L Chen, H Ren… - IEEE Internet of Things …, 2025 - ieeexplore.ieee.org
Large language models (LLMs) can transform natural language instructions into executable
commands for IoT devices like unmanned aerial vehicles (UAVs), creating new development …

Deceiving LLM through Compositional Instruction with Hidden Attacks

S Jiang, X Chen, R Tang - ACM Transactions on Autonomous and …, 2025 - dl.acm.org
Recently, large language models (LLMs) have demonstrated promising applications in the
autonomous driving (AD) domain, including language-based interactions and decision …

Understanding and Enhancing the Transferability of Jailbreaking Attacks

R Lin, B Han, F Li, T Liu - arxiv preprint arxiv:2502.03052, 2025 - arxiv.org
Jailbreaking attacks can effectively manipulate open-source large language models (LLMs)
to produce harmful responses. However, these attacks exhibit limited transferability, failing to …

JBShield: Defending Large Language Models from Jailbreak Attacks through Activated Concept Analysis and Manipulation

S Zhang, Y Zhai, K Guo, H Hu, S Guo, Z Fang… - arxiv preprint arxiv …, 2025 - arxiv.org
Despite the implementation of safety alignment strategies, large language models (LLMs)
remain vulnerable to jailbreak attacks, which undermine these safety guardrails and pose …

Distraction is All You Need for Multimodal Large Language Model Jailbreaking

Z Yang, J Fan, A Yan, E Gao, X Lin, T Li… - arxiv preprint arxiv …, 2025 - arxiv.org
Multimodal Large Language Models (MLLMs) bridge the gap between visual and textual
data, enabling a range of advanced applications. However, complex internal interactions …

Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond

S Han - arxiv preprint arxiv:2410.18114, 2024 - arxiv.org
The advancements in generative AI inevitably raise concerns about their risks and safety
implications, which, in return, catalyzes significant progress in AI safety. However, as this …

Divide and Conquer: A Hybrid Strategy Defeats Multimodal Large Language Models

Y Mao, P Liu, T Cui, C Liu, D You - arxiv preprint arxiv:2412.16555, 2024 - arxiv.org
Large language models (LLMs) are widely applied in various fields of society due to their
powerful reasoning, understanding, and generation capabilities. However, the security …

Dual Intention Escape: Jailbreak Attack against Large Language Models

Y Xue, J Wang, Z Yin, Y Ma, H Qin, R Tao… - THE WEB CONFERENCE … - openreview.net
Recently, the jailbreak attack, which generates adversarial prompts to bypass safety
measures and mislead large language models (LLMs) to output harmful answers, has …