The rise and potential of large language model based agents: A survey
For a long time, researchers have sought artificial intelligence (AI) that matches or exceeds
human intelligence. AI agents, which are artificial entities capable of sensing the …
human intelligence. AI agents, which are artificial entities capable of sensing the …
Adversarial machine learning for network intrusion detection systems: A comprehensive survey
Network-based Intrusion Detection System (NIDS) forms the frontline defence against
network attacks that compromise the security of the data, systems, and networks. In recent …
network attacks that compromise the security of the data, systems, and networks. In recent …
Holistic evaluation of language models
Language models (LMs) are becoming the foundation for almost all major language
technologies, but their capabilities, limitations, and risks are not well understood. We present …
technologies, but their capabilities, limitations, and risks are not well understood. We present …
Better diffusion models further improve adversarial training
It has been recognized that the data generated by the denoising diffusion probabilistic
model (DDPM) improves adversarial training. After two years of rapid development in …
model (DDPM) improves adversarial training. After two years of rapid development in …
Cross-entropy loss functions: Theoretical analysis and applications
Cross-entropy is a widely used loss function in applications. It coincides with the logistic loss
applied to the outputs of a neural network, when the softmax is used. But, what guarantees …
applied to the outputs of a neural network, when the softmax is used. But, what guarantees …
Test-time prompt tuning for zero-shot generalization in vision-language models
Pre-trained vision-language models (eg, CLIP) have shown promising zero-shot
generalization in many downstream tasks with properly designed text prompts. Instead of …
generalization in many downstream tasks with properly designed text prompts. Instead of …
Diffusion models for adversarial purification
Adversarial purification refers to a class of defense methods that remove adversarial
perturbations using a generative model. These methods do not make assumptions on the …
perturbations using a generative model. These methods do not make assumptions on the …
Foundational challenges in assuring alignment and safety of large language models
This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …
language models (LLMs). These challenges are organized into three different categories …
Improving robustness using generated data
Recent work argues that robust training requires substantially larger datasets than those
required for standard classification. On CIFAR-10 and CIFAR-100, this translates into a …
required for standard classification. On CIFAR-10 and CIFAR-100, this translates into a …
Smoothllm: Defending large language models against jailbreaking attacks
Despite efforts to align large language models (LLMs) with human values, widely-used
LLMs such as GPT, Llama, Claude, and PaLM are susceptible to jailbreaking attacks …
LLMs such as GPT, Llama, Claude, and PaLM are susceptible to jailbreaking attacks …