Agentharm: A benchmark for measuring harmfulness of llm agents
The robustness of LLMs to jailbreak attacks, where users design prompts to circumvent
safety measures and misuse model capabilities, has been studied primarily for LLMs acting …
safety measures and misuse model capabilities, has been studied primarily for LLMs acting …
Aligning llms to be robust against prompt injection
Large language models (LLMs) are becoming increasingly prevalent in modern software
systems, interfacing between the user and the internet to assist with tasks that require …
systems, interfacing between the user and the internet to assist with tasks that require …
Jailbreaking llm-controlled robots
The recent introduction of large language models (LLMs) has revolutionized the field of
robotics by enabling contextual reasoning and intuitive human-robot interaction in domains …
robotics by enabling contextual reasoning and intuitive human-robot interaction in domains …
Security matrix for multimodal agents on mobile devices: A systematic and proof of concept study
The rapid progress in the reasoning capability of the Multi-modal Large Language Models
(MLLMs) has triggered the development of autonomous agent systems on mobile devices …
(MLLMs) has triggered the development of autonomous agent systems on mobile devices …
Prompt injection attacks on vision language models in oncology
Vision-language artificial intelligence models (VLMs) possess medical knowledge and can
be employed in healthcare in numerous ways, including as image interpreters, virtual …
be employed in healthcare in numerous ways, including as image interpreters, virtual …
Mobilesafetybench: Evaluating safety of autonomous agents in mobile device control
Autonomous agents powered by large language models (LLMs) show promising potential in
assistive tasks across various domains, including mobile device control. As these agents …
assistive tasks across various domains, including mobile device control. As these agents …
RedCode: Risky Code Execution and Generation Benchmark for Code Agents
With the rapidly increasing capabilities and adoption of code agents for AI-assisted coding,
safety concerns, such as generating or executing risky code, have become significant …
safety concerns, such as generating or executing risky code, have become significant …
Peering Behind the Shield: Guardrail Identification in Large Language Models
Human-AI conversations have gained increasing attention since the era of large language
models. Consequently, more techniques, such as input/output guardrails and safety …
models. Consequently, more techniques, such as input/output guardrails and safety …
The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents
Large Language Model (LLM) agents are increasingly being deployed as conversational
assistants capable of performing complex real-world tasks through tool integration. This …
assistants capable of performing complex real-world tasks through tool integration. This …
Agent-SafetyBench: Evaluating the Safety of LLM Agents
As large language models (LLMs) are increasingly deployed as agents, their integration into
interactive environments and tool use introduce new safety challenges beyond those …
interactive environments and tool use introduce new safety challenges beyond those …