- Academic Search

M Andriushchenko, A Souly, M Dziemian… - arxiv preprint arxiv …, 2024 - arxiv.org

The robustness of LLMs to jailbreak attacks, where users design prompts to circumvent
safety measures and misuse model capabilities, has been studied primarily for LLMs acting …

Speichern Zitieren Zitiert von: 9 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Aligning llms to be robust against prompt injection

S Chen, A Zharmagambetov, S Mahloujifar… - arxiv preprint arxiv …, 2024 - arxiv.org

Large language models (LLMs) are becoming increasingly prevalent in modern software
systems, interfacing between the user and the internet to assist with tasks that require …

Speichern Zitieren Zitiert von: 6 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Jailbreaking llm-controlled robots

A Robey, Z Ravichandran, V Kumar, H Hassani… - arxiv preprint arxiv …, 2024 - arxiv.org

The recent introduction of large language models (LLMs) has revolutionized the field of
robotics by enabling contextual reasoning and intuitive human-robot interaction in domains …

Speichern Zitieren Zitiert von: 7 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Security matrix for multimodal agents on mobile devices: A systematic and proof of concept study

Y Yang, X Yang, S Li, C Lin, Z Zhao, C Shen… - arxiv preprint arxiv …, 2024 - arxiv.org

The rapid progress in the reasoning capability of the Multi-modal Large Language Models
(MLLMs) has triggered the development of autonomous agent systems on mobile devices …

Speichern Zitieren Zitiert von: 5 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] nature.com

Prompt injection attacks on vision language models in oncology

J Clusmann, D Ferber, IC Wiest, CV Schneider… - Nature …, 2025 - nature.com

Vision-language artificial intelligence models (VLMs) possess medical knowledge and can
be employed in healthcare in numerous ways, including as image interpreters, virtual …

Speichern Zitieren Ähnliche Artikel Alle 5 Versionen

[Free GPT-4]

[PDF] arxiv.org

Mobilesafetybench: Evaluating safety of autonomous agents in mobile device control

J Lee, D Hahm, JS Choi, WB Knox, K Lee - arxiv preprint arxiv …, 2024 - arxiv.org

Autonomous agents powered by large language models (LLMs) show promising potential in
assistive tasks across various domains, including mobile device control. As these agents …

Speichern Zitieren Zitiert von: 1 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

RedCode: Risky Code Execution and Generation Benchmark for Code Agents

C Guo, X Liu, C **e, A Zhou, Y Zeng, Z Lin… - arxiv preprint arxiv …, 2024 - arxiv.org

With the rapidly increasing capabilities and adoption of code agents for AI-assisted coding,
safety concerns, such as generating or executing risky code, have become significant …

Speichern Zitieren Ähnliche Artikel Alle 4 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Peering Behind the Shield: Guardrail Identification in Large Language Models

Z Yang, Y Wu, R Wen, M Backes, Y Zhang - arxiv preprint arxiv …, 2025 - arxiv.org

Human-AI conversations have gained increasing attention since the era of large language
models. Consequently, more techniques, such as input/output guardrails and safety …

Speichern Zitieren Ähnliche Artikel HTML-Version

[Free GPT-4]

[PDF] arxiv.org

The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents

F Jia, T Wu, X Qin, A Squicciarini - arxiv preprint arxiv:2412.16682, 2024 - arxiv.org

Large Language Model (LLM) agents are increasingly being deployed as conversational
assistants capable of performing complex real-world tasks through tool integration. This …

Speichern Zitieren Ähnliche Artikel HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Agent-SafetyBench: Evaluating the Safety of LLM Agents

Z Zhang, S Cui, Y Lu, J Zhou, J Yang, H Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

As large language models (LLMs) are increasingly deployed as agents, their integration into
interactive environments and tool use introduce new safety challenges beyond those …

Speichern Zitieren Ähnliche Artikel Alle 2 Versionen HTML-Version

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents

Agentharm: A benchmark for measuring harmfulness of llm agents

Aligning llms to be robust against prompt injection

Jailbreaking llm-controlled robots

Security matrix for multimodal agents on mobile devices: A systematic and proof of concept study

Prompt injection attacks on vision language models in oncology

Mobilesafetybench: Evaluating safety of autonomous agents in mobile device control

RedCode: Risky Code Execution and Generation Benchmark for Code Agents

Peering Behind the Shield: Guardrail Identification in Large Language Models

The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents

Agent-SafetyBench: Evaluating the Safety of LLM Agents