The rise and potential of large language model based agents: A survey

Z **, W Chen, X Guo, W He, Y Ding, B Hong… - Science China …, 2025 - Springer
For a long time, researchers have sought artificial intelligence (AI) that matches or exceeds
human intelligence. AI agents, which are artificial entities capable of sensing the …

Explainable ai: A review of machine learning interpretability methods

P Linardatos, V Papastefanopoulos, S Kotsiantis - Entropy, 2020 - mdpi.com
Recent advances in artificial intelligence (AI) have led to its widespread industrial adoption,
with machine learning systems demonstrating superhuman performance in a significant …

[PDF][PDF] DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.

B Wang, W Chen, H Pei, C **e, M Kang, C Zhang, C Xu… - NeurIPS, 2023 - blogs.qub.ac.uk
Abstract Generative Pre-trained Transformer (GPT) models have exhibited exciting progress
in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the …

Tree of attacks: Jailbreaking black-box llms automatically

A Mehrotra, M Zampetakis… - Advances in …, 2025 - proceedings.neurips.cc
Abstract While Large Language Models (LLMs) display versatile functionality, they continue
to generate harmful, biased, and toxic content, as demonstrated by the prevalence of human …

Promptbench: Towards evaluating the robustness of large language models on adversarial prompts

K Zhu, J Wang, J Zhou, Z Wang, H Chen… - arxiv e …, 2023 - ui.adsabs.harvard.edu
The increasing reliance on Large Language Models (LLMs) across academia and industry
necessitates a comprehensive understanding of their robustness to prompts. In response to …

Smoothllm: Defending large language models against jailbreaking attacks

A Robey, E Wong, H Hassani, GJ Pappas - arxiv preprint arxiv …, 2023 - arxiv.org
Despite efforts to align large language models (LLMs) with human values, widely-used
LLMs such as GPT, Llama, Claude, and PaLM are susceptible to jailbreaking attacks …

A survey of safety and trustworthiness of large language models through the lens of verification and validation

X Huang, W Ruan, W Huang, G **, Y Dong… - Artificial Intelligence …, 2024 - Springer
Large language models (LLMs) have exploded a new heatwave of AI for their ability to
engage end-users in human-level conversations with detailed and articulate answers across …

In chatgpt we trust? measuring and characterizing the reliability of chatgpt

X Shen, Z Chen, M Backes, Y Zhang - arxiv preprint arxiv:2304.08979, 2023 - arxiv.org
The way users acquire information is undergoing a paradigm shift with the advent of
ChatGPT. Unlike conventional search engines, ChatGPT retrieves knowledge from the …

Sneakyprompt: Jailbreaking text-to-image generative models

Y Yang, B Hui, H Yuan, N Gong… - 2024 IEEE symposium on …, 2024 - ieeexplore.ieee.org
Text-to-image generative models such as Stable Diffusion and DALL• E raise many ethical
concerns due to the generation of harmful images such as Not-Safe-for-Work (NSFW) ones …

Bert-attack: Adversarial attack against bert using bert

L Li, R Ma, Q Guo, X Xue, X Qiu - arxiv preprint arxiv:2004.09984, 2020 - arxiv.org
Adversarial attacks for discrete data (such as texts) have been proved significantly more
challenging than continuous data (such as images) since it is difficult to generate adversarial …