Ai alignment: A comprehensive survey

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arxiv preprint arxiv …, 2023 - arxiv.org
AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …

Physical adversarial attack meets computer vision: A decade survey

H Wei, H Tang, X Jia, Z Wang, H Yu, Z Li… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Despite the impressive achievements of Deep Neural Networks (DNNs) in computer vision,
their vulnerability to adversarial attacks remains a critical concern. Extensive research has …

Toward transparent ai: A survey on interpreting the inner structures of deep neural networks

T Räuker, A Ho, S Casper… - 2023 ieee conference …, 2023 - ieeexplore.ieee.org
The last decade of machine learning has seen drastic increases in scale and capabilities.
Deep neural networks (DNNs) are increasingly being deployed in the real world. However …

Black-box access is insufficient for rigorous ai audits

S Casper, C Ezell, C Siegmann, N Kolt… - The 2024 ACM …, 2024 - dl.acm.org
External audits of AI systems are increasingly recognized as a key mechanism for AI
governance. The effectiveness of an audit, however, depends on the degree of access …

Red teaming deep neural networks with feature synthesis tools

S Casper, T Bu, Y Li, J Li, K Zhang… - Advances in …, 2023 - proceedings.neurips.cc
Interpretable AI tools are often motivated by the goal of understanding model behavior in out-
of-distribution (OOD) contexts. Despite the attention this area of study receives, there are …

International Scientific Report on the Safety of Advanced AI (Interim Report)

Y Bengio, S Mindermann, D Privitera… - arxiv preprint arxiv …, 2024 - arxiv.org
This is the interim publication of the first International Scientific Report on the Safety of
Advanced AI. The report synthesises the scientific understanding of general-purpose AI--AI …

A survey on physical adversarial attack in computer vision

D Wang, W Yao, T Jiang, G Tang, X Chen - arxiv preprint arxiv …, 2022 - arxiv.org
Over the past decade, deep learning has revolutionized conventional tasks that rely on hand-
craft feature extraction with its strong feature learning capability, leading to substantial …

Discovering bugs in vision models using off-the-shelf image generation and captioning

O Wiles, I Albuquerque, S Gowal - arxiv preprint arxiv:2208.08831, 2022 - arxiv.org
Automatically discovering failures in vision models under real-world settings remains an
open challenge. This work demonstrates how off-the-shelf, large-scale, image-to-text and …

Physical adversarial attacks for camera-based smart systems: Current trends, categorization, applications, research challenges, and future outlook

A Guesmi, MA Hanif, B Ouni, M Shafique - IEEE Access, 2023 - ieeexplore.ieee.org
Deep Neural Networks (DNNs) have shown impressive performance in computer vision
tasks; however, their vulnerability to adversarial attacks raises concerns regarding their …

DiG-IN: Diffusion Guidance for Investigating Networks-Uncovering Classifier Differences Neuron Visualisations and Visual Counterfactual Explanations

M Augustin, Y Neuhaus, M Hein - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
While deep learning has led to huge progress in complex image classification tasks like
ImageNet unexpected failure modes eg via spurious features call into question how reliably …