Google Наука

X Wang, D Wu, Z Ji, Z Li, P Ma, S Wang, Y Li… - arxiv preprint arxiv …, 2024 - arxiv.org

Jailbreaking is an emerging adversarial attack that bypasses the safety alignment deployed
in off-the-shelf large language models (LLMs) and has evolved into multiple categories …

Запазване Позоваване С позовавания в 4 Сродни статии Всички 4 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Autopt: How far are we from the end2end automated web penetration testing?

B Wu, G Chen, K Chen, X Shang, J Han, Y He… - arxiv preprint arxiv …, 2024 - arxiv.org

Penetration testing is essential to ensure Web security, which can detect and fix
vulnerabilities in advance, and prevent data leakage and serious consequences. The …

Запазване Позоваване С позовавания в 1 Сродни статии Всички 2 версии Във вид на HTML

Decomposition, Synthesis and Attack: A Multi-Instruction Fusion Method for Jailbreaking LLMs

S Jiang, X Chen, K Xu, L Chen, H Ren… - IEEE Internet of Things …, 2025 - ieeexplore.ieee.org

Large language models (LLMs) can transform natural language instructions into executable
commands for IoT devices like unmanned aerial vehicles (UAVs), creating new development …

Запазване Позоваване Сродни статии

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Deceiving LLM through Compositional Instruction with Hidden Attacks

S Jiang, X Chen, R Tang - ACM Transactions on Autonomous and …, 2025 - dl.acm.org

Recently, large language models (LLMs) have demonstrated promising applications in the
autonomous driving (AD) domain, including language-based interactions and decision …

Запазване Позоваване Сродни статии

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Understanding and Enhancing the Transferability of Jailbreaking Attacks

R Lin, B Han, F Li, T Liu - arxiv preprint arxiv:2502.03052, 2025 - arxiv.org

Jailbreaking attacks can effectively manipulate open-source large language models (LLMs)
to produce harmful responses. However, these attacks exhibit limited transferability, failing to …

Запазване Позоваване Сродни статии Всички 3 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

JBShield: Defending Large Language Models from Jailbreak Attacks through Activated Concept Analysis and Manipulation

S Zhang, Y Zhai, K Guo, H Hu, S Guo, Z Fang… - arxiv preprint arxiv …, 2025 - arxiv.org

Despite the implementation of safety alignment strategies, large language models (LLMs)
remain vulnerable to jailbreak attacks, which undermine these safety guardrails and pose …

Запазване Позоваване Сродни статии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Distraction is All You Need for Multimodal Large Language Model Jailbreaking

Z Yang, J Fan, A Yan, E Gao, X Lin, T Li… - arxiv preprint arxiv …, 2025 - arxiv.org

Multimodal Large Language Models (MLLMs) bridge the gap between visual and textual
data, enabling a range of advanced applications. However, complex internal interactions …

Запазване Позоваване Сродни статии Всички 2 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond

S Han - arxiv preprint arxiv:2410.18114, 2024 - arxiv.org

The advancements in generative AI inevitably raise concerns about their risks and safety
implications, which, in return, catalyzes significant progress in AI safety. However, as this …

Запазване Позоваване Сродни статии Всички 4 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Divide and Conquer: A Hybrid Strategy Defeats Multimodal Large Language Models

Y Mao, P Liu, T Cui, C Liu, D You - arxiv preprint arxiv:2412.16555, 2024 - arxiv.org

Large language models (LLMs) are widely applied in various fields of society due to their
powerful reasoning, understanding, and generation capabilities. However, the security …

Запазване Позоваване Сродни статии Всички 2 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Dual Intention Escape: Jailbreak Attack against Large Language Models

Y Xue, J Wang, Z Yin, Y Ma, H Qin, R Tao… - THE WEB CONFERENCE … - openreview.net

Recently, the jailbreak attack, which generates adversarial prompts to bypass safety
measures and mislead large language models (LLMs) to output harmful answers, has …

Запазване Позоваване Сродни статии Във вид на HTML

Създаване на сигнал

Позоваване

Разширено търсене

Запазено в „Моята библиотека“

{LLM-Fuzzer}: Scaling assessment of large language model jailbreaks

Selfdefend: Llms can defend themselves against jailbreaking in a practical manner

Autopt: How far are we from the end2end automated web penetration testing?

Decomposition, Synthesis and Attack: A Multi-Instruction Fusion Method for Jailbreaking LLMs

Deceiving LLM through Compositional Instruction with Hidden Attacks

Understanding and Enhancing the Transferability of Jailbreaking Attacks

JBShield: Defending Large Language Models from Jailbreak Attacks through Activated Concept Analysis and Manipulation

Distraction is All You Need for Multimodal Large Language Model Jailbreaking

Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond

Divide and Conquer: A Hybrid Strategy Defeats Multimodal Large Language Models

Dual Intention Escape: Jailbreak Attack against Large Language Models