Ai alignment: A comprehensive survey
AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …
Physical adversarial attack meets computer vision: A decade survey
Despite the impressive achievements of Deep Neural Networks (DNNs) in computer vision,
their vulnerability to adversarial attacks remains a critical concern. Extensive research has …
their vulnerability to adversarial attacks remains a critical concern. Extensive research has …
Toward transparent ai: A survey on interpreting the inner structures of deep neural networks
The last decade of machine learning has seen drastic increases in scale and capabilities.
Deep neural networks (DNNs) are increasingly being deployed in the real world. However …
Deep neural networks (DNNs) are increasingly being deployed in the real world. However …
Black-box access is insufficient for rigorous ai audits
External audits of AI systems are increasingly recognized as a key mechanism for AI
governance. The effectiveness of an audit, however, depends on the degree of access …
governance. The effectiveness of an audit, however, depends on the degree of access …
Red teaming deep neural networks with feature synthesis tools
Interpretable AI tools are often motivated by the goal of understanding model behavior in out-
of-distribution (OOD) contexts. Despite the attention this area of study receives, there are …
of-distribution (OOD) contexts. Despite the attention this area of study receives, there are …
International Scientific Report on the Safety of Advanced AI (Interim Report)
This is the interim publication of the first International Scientific Report on the Safety of
Advanced AI. The report synthesises the scientific understanding of general-purpose AI--AI …
Advanced AI. The report synthesises the scientific understanding of general-purpose AI--AI …
A survey on physical adversarial attack in computer vision
Over the past decade, deep learning has revolutionized conventional tasks that rely on hand-
craft feature extraction with its strong feature learning capability, leading to substantial …
craft feature extraction with its strong feature learning capability, leading to substantial …
Discovering bugs in vision models using off-the-shelf image generation and captioning
Automatically discovering failures in vision models under real-world settings remains an
open challenge. This work demonstrates how off-the-shelf, large-scale, image-to-text and …
open challenge. This work demonstrates how off-the-shelf, large-scale, image-to-text and …
Physical adversarial attacks for camera-based smart systems: Current trends, categorization, applications, research challenges, and future outlook
Deep Neural Networks (DNNs) have shown impressive performance in computer vision
tasks; however, their vulnerability to adversarial attacks raises concerns regarding their …
tasks; however, their vulnerability to adversarial attacks raises concerns regarding their …
DiG-IN: Diffusion Guidance for Investigating Networks-Uncovering Classifier Differences Neuron Visualisations and Visual Counterfactual Explanations
While deep learning has led to huge progress in complex image classification tasks like
ImageNet unexpected failure modes eg via spurious features call into question how reliably …
ImageNet unexpected failure modes eg via spurious features call into question how reliably …