Latent adversarial training improves robustness to persistent harmful behaviors in llms
Large language models (LLMs) can often be made to behave in undesirable ways that they
are explicitly fine-tuned not to. For example, the LLM red-teaming literature has produced a …
are explicitly fine-tuned not to. For example, the LLM red-teaming literature has produced a …
Trading Inference-Time Compute for Adversarial Robustness
W Zaremba, E Nitishinskaya, B Barak, S Lin… - arxiv preprint arxiv …, 2025 - arxiv.org
We conduct experiments on the impact of increasing inference-time compute in reasoning
models (specifically OpenAI o1-preview and o1-mini) on their robustness to adversarial …
models (specifically OpenAI o1-preview and o1-mini) on their robustness to adversarial …
Adversarial Training: A Survey
Adversarial training (AT) refers to integrating adversarial examples--inputs altered with
imperceptible perturbations that can significantly impact model predictions--into the training …
imperceptible perturbations that can significantly impact model predictions--into the training …
Comparative Study of Adversarial Defenses: Adversarial Training and Regularization in Vision Transformers and CNNs
H Dingeto, J Kim - Electronics, 2024 - mdpi.com
Transformer-based models are driving a significant revolution in the field of machine
learning at the moment. Among these innovations, vision transformers (ViTs) stand out for …
learning at the moment. Among these innovations, vision transformers (ViTs) stand out for …
Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models
Multi-modal Large Language Models (MLLMs) excel in vision-language tasks but remain
vulnerable to visual adversarial perturbations that can induce hallucinations, manipulate …
vulnerable to visual adversarial perturbations that can induce hallucinations, manipulate …
Double Visual Defense: Adversarial Pre-training and Instruction Tuning for Improving Vision-Language Model Robustness
This paper investigates the robustness of vision-language models against adversarial visual
perturbations and introduces a novel``double visual defense" to enhance this robustness …
perturbations and introduces a novel``double visual defense" to enhance this robustness …
TAPT: Test-Time Adversarial Prompt Tuning for Robust Inference in Vision-Language Models
Large pre-trained Vision-Language Models (VLMs) such as CLIP have demonstrated
excellent zero-shot generalizability across various downstream tasks. However, recent …
excellent zero-shot generalizability across various downstream tasks. However, recent …
MIMIR: Masked Image Modeling for Mutual Information-based Adversarial Robustness
Vision Transformers (ViTs) achieve excellent performance in various tasks, but they are also
vulnerable to adversarial attacks. Building robust ViTs is highly dependent on dedicated …
vulnerable to adversarial attacks. Building robust ViTs is highly dependent on dedicated …
[PDF][PDF] Deep Learning for Robust Facial Expression Recognition: A Resilient Defense Against Adversarial Attacks
Adversarial attacks can be extremely dangerous, particularly in scenarios where the
precision of facial expression identification is of utmost importance. Hiring adversarial …
precision of facial expression identification is of utmost importance. Hiring adversarial …