Agentharm: A benchmark for measuring harmfulness of llm agents

M Andriushchenko, A Souly, M Dziemian… - arxiv preprint arxiv …, 2024 - arxiv.org
The robustness of LLMs to jailbreak attacks, where users design prompts to circumvent
safety measures and misuse model capabilities, has been studied primarily for LLMs acting …

Aligning llms to be robust against prompt injection

S Chen, A Zharmagambetov, S Mahloujifar… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) are becoming increasingly prevalent in modern software
systems, interfacing between the user and the internet to assist with tasks that require …

Jailbreaking llm-controlled robots

A Robey, Z Ravichandran, V Kumar, H Hassani… - arxiv preprint arxiv …, 2024 - arxiv.org
The recent introduction of large language models (LLMs) has revolutionized the field of
robotics by enabling contextual reasoning and intuitive human-robot interaction in domains …

Security matrix for multimodal agents on mobile devices: A systematic and proof of concept study

Y Yang, X Yang, S Li, C Lin, Z Zhao, C Shen… - arxiv preprint arxiv …, 2024 - arxiv.org
The rapid progress in the reasoning capability of the Multi-modal Large Language Models
(MLLMs) has triggered the development of autonomous agent systems on mobile devices …

Prompt injection attacks on vision language models in oncology

J Clusmann, D Ferber, IC Wiest, CV Schneider… - Nature …, 2025 - nature.com
Vision-language artificial intelligence models (VLMs) possess medical knowledge and can
be employed in healthcare in numerous ways, including as image interpreters, virtual …

Mobilesafetybench: Evaluating safety of autonomous agents in mobile device control

J Lee, D Hahm, JS Choi, WB Knox, K Lee - arxiv preprint arxiv …, 2024 - arxiv.org
Autonomous agents powered by large language models (LLMs) show promising potential in
assistive tasks across various domains, including mobile device control. As these agents …

RedCode: Risky Code Execution and Generation Benchmark for Code Agents

C Guo, X Liu, C **e, A Zhou, Y Zeng, Z Lin… - arxiv preprint arxiv …, 2024 - arxiv.org
With the rapidly increasing capabilities and adoption of code agents for AI-assisted coding,
safety concerns, such as generating or executing risky code, have become significant …

Peering Behind the Shield: Guardrail Identification in Large Language Models

Z Yang, Y Wu, R Wen, M Backes, Y Zhang - arxiv preprint arxiv …, 2025 - arxiv.org
Human-AI conversations have gained increasing attention since the era of large language
models. Consequently, more techniques, such as input/output guardrails and safety …

The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents

F Jia, T Wu, X Qin, A Squicciarini - arxiv preprint arxiv:2412.16682, 2024 - arxiv.org
Large Language Model (LLM) agents are increasingly being deployed as conversational
assistants capable of performing complex real-world tasks through tool integration. This …

Agent-SafetyBench: Evaluating the Safety of LLM Agents

Z Zhang, S Cui, Y Lu, J Zhou, J Yang, H Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
As large language models (LLMs) are increasingly deployed as agents, their integration into
interactive environments and tool use introduce new safety challenges beyond those …