Adversarial attacks and defenses in deep learning: From a perspective of cybersecurity
The outstanding performance of deep neural networks has promoted deep learning
applications in a broad set of domains. However, the potential risks caused by adversarial …
applications in a broad set of domains. However, the potential risks caused by adversarial …
Open problems and fundamental limitations of reinforcement learning from human feedback
Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state …
to align with human goals. RLHF has emerged as the central method used to finetune state …
“real attackers don't compute gradients”: bridging the gap between adversarial ml research and practice
Recent years have seen a proliferation of research on adversarial machine learning.
Numerous papers demonstrate powerful algorithmic attacks against a wide variety of …
Numerous papers demonstrate powerful algorithmic attacks against a wide variety of …
Policycleanse: Backdoor detection and mitigation for competitive reinforcement learning
While real-world applications of reinforcement learning (RL) are becoming popular, the
security and robustness of RL systems are worthy of more attention and exploration. In …
security and robustness of RL systems are worthy of more attention and exploration. In …
Adversarial policies beat superhuman go AIs
We attack the state-of-the-art Go-playing AI system KataGo by training adversarial policies
against it, achieving a $> $97% win rate against KataGo running at superhuman settings …
against it, achieving a $> $97% win rate against KataGo running at superhuman settings …
Sok: Explainable machine learning for computer security applications
Explainable Artificial Intelligence (XAI) aims to improve the transparency of machine
learning (ML) pipelines. We systematize the increasingly growing (but fragmented) …
learning (ML) pipelines. We systematize the increasingly growing (but fragmented) …
Race: Robust adversarial concept erasure for secure text-to-image diffusion model
In the evolving landscape of text-to-image (T2I) diffusion models, the remarkable capability
to generate high-quality images from textual descriptions faces challenges with the potential …
to generate high-quality images from textual descriptions faces challenges with the potential …
Adversarial Machine Learning Attacks and Defences in Multi-Agent Reinforcement Learning
M Standen, J Kim, C Szabo - ACM Computing Surveys, 2023 - dl.acm.org
Multi-Agent Reinforcement Learning (MARL) is susceptible to Adversarial Machine Learning
(AML) attacks. Execution-time AML attacks against MARL are complex due to effects that …
(AML) attacks. Execution-time AML attacks against MARL are complex due to effects that …
" Get in Researchers; We're Measuring Reproducibility": A Reproducibility Study of Machine Learning Papers in Tier 1 Security Conferences
Reproducibility is crucial to the advancement of science; it strengthens confidence in
seemingly contradictory results and expands the boundaries of known discoveries …
seemingly contradictory results and expands the boundaries of known discoveries …
Curiosity-driven and victim-aware adversarial policies
Recent years have witnessed great potential in applying Deep Reinforcement Learning
(DRL) in various challenging applications, such as autonomous driving, nuclear fusion …
(DRL) in various challenging applications, such as autonomous driving, nuclear fusion …