The rise and potential of large language model based agents: A survey

Z **, W Chen, X Guo, W He, Y Ding, B Hong… - Science China …, 2025 - Springer
For a long time, researchers have sought artificial intelligence (AI) that matches or exceeds
human intelligence. AI agents, which are artificial entities capable of sensing the …

Adversarial machine learning for network intrusion detection systems: A comprehensive survey

K He, DD Kim, MR Asghar - IEEE Communications Surveys & …, 2023 - ieeexplore.ieee.org
Network-based Intrusion Detection System (NIDS) forms the frontline defence against
network attacks that compromise the security of the data, systems, and networks. In recent …

Holistic evaluation of language models

P Liang, R Bommasani, T Lee, D Tsipras… - arxiv preprint arxiv …, 2022 - arxiv.org
Language models (LMs) are becoming the foundation for almost all major language
technologies, but their capabilities, limitations, and risks are not well understood. We present …

Better diffusion models further improve adversarial training

Z Wang, T Pang, C Du, M Lin… - … on Machine Learning, 2023 - proceedings.mlr.press
It has been recognized that the data generated by the denoising diffusion probabilistic
model (DDPM) improves adversarial training. After two years of rapid development in …

Cross-entropy loss functions: Theoretical analysis and applications

A Mao, M Mohri, Y Zhong - International conference on …, 2023 - proceedings.mlr.press
Cross-entropy is a widely used loss function in applications. It coincides with the logistic loss
applied to the outputs of a neural network, when the softmax is used. But, what guarantees …

Test-time prompt tuning for zero-shot generalization in vision-language models

M Shu, W Nie, DA Huang, Z Yu… - Advances in …, 2022 - proceedings.neurips.cc
Pre-trained vision-language models (eg, CLIP) have shown promising zero-shot
generalization in many downstream tasks with properly designed text prompts. Instead of …

Diffusion models for adversarial purification

W Nie, B Guo, Y Huang, C **ao, A Vahdat… - arxiv preprint arxiv …, 2022 - arxiv.org
Adversarial purification refers to a class of defense methods that remove adversarial
perturbations using a generative model. These methods do not make assumptions on the …

Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - arxiv preprint arxiv …, 2024 - arxiv.org
This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

Improving robustness using generated data

S Gowal, SA Rebuffi, O Wiles… - Advances in …, 2021 - proceedings.neurips.cc
Recent work argues that robust training requires substantially larger datasets than those
required for standard classification. On CIFAR-10 and CIFAR-100, this translates into a …

Smoothllm: Defending large language models against jailbreaking attacks

A Robey, E Wong, H Hassani, GJ Pappas - arxiv preprint arxiv …, 2023 - arxiv.org
Despite efforts to align large language models (LLMs) with human values, widely-used
LLMs such as GPT, Llama, Claude, and PaLM are susceptible to jailbreaking attacks …