Ai agents under threat: A survey of key security challenges and future pathways

Z Deng, Y Guo, C Han, W Ma, J **ong, S Wen… - ACM Computing …, 2024 - dl.acm.org
An Artificial Intelligence (AI) agent is a software entity that autonomously performs tasks or
makes decisions based on pre-defined objectives and data inputs. AI agents, capable of …

Benchmarking large language models on cmexam-a comprehensive chinese medical exam dataset

J Liu, P Zhou, Y Hua, D Chong, Z Tian… - Advances in …, 2024 - proceedings.neurips.cc
Recent advancements in large language models (LLMs) have transformed the field of
question answering (QA). However, evaluating LLMs in the medical field is challenging due …

Knowledge conflicts for llms: A survey

R Xu, Z Qi, Z Guo, C Wang, H Wang, Y Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
This survey provides an in-depth analysis of knowledge conflicts for large language models
(LLMs), highlighting the complex challenges they encounter when blending contextual and …

Privacy in large language models: Attacks, defenses and future directions

H Li, Y Chen, J Luo, J Wang, H Peng, Y Kang… - arxiv preprint arxiv …, 2023 - arxiv.org
The advancement of large language models (LLMs) has significantly enhanced the ability to
effectively tackle various downstream NLP tasks and unify these tasks into generative …

Mllm-protector: Ensuring mllm's safety without hurting performance

R Pi, T Han, J Zhang, Y **e, R Pan, Q Lian… - arxiv preprint arxiv …, 2024 - arxiv.org
The deployment of multimodal large language models (MLLMs) has brought forth a unique
vulnerability: susceptibility to malicious attacks through visual inputs. This paper investigates …

Strengthening multimodal large language model with bootstrapped preference optimization

R Pi, T Han, W **ong, J Zhang, R Liu, R Pan… - … on Computer Vision, 2024 - Springer
Abstract Multimodal Large Language Models (MLLMs) excel in generating responses based
on visual inputs. However, they often suffer from a bias towards generating responses …

Against The Achilles' Heel: A Survey on Red Teaming for Generative Models

L Lin, H Mu, Z Zhai, M Wang, Y Wang, R Wang… - Journal of Artificial …, 2025 - jair.org
Generative models are rapidly gaining popularity and being integrated into everyday
applications, raising concerns over their safe use as various vulnerabilities are exposed. In …

The instruction hierarchy: Training llms to prioritize privileged instructions

E Wallace, K **ao, R Leike, L Weng, J Heidecke… - arxiv preprint arxiv …, 2024 - arxiv.org
Today's LLMs are susceptible to prompt injections, jailbreaks, and other attacks that allow
adversaries to overwrite a model's original instructions with their own malicious prompts. In …

StruQ: Defending against prompt injection with structured queries

S Chen, J Piet, C Sitawarin, D Wagner - arxiv preprint arxiv:2402.06363, 2024 - arxiv.org
Recent advances in Large Language Models (LLMs) enable exciting LLM-integrated
applications, which perform text-based tasks by utilizing their advanced language …

Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?

E Zverev, S Abdelnabi, S Tabesh, M Fritz… - arxiv preprint arxiv …, 2024 - arxiv.org
Instruction-tuned Large Language Models (LLMs) show impressive results in numerous
practical applications, but they lack essential safety features that are common in other areas …