A survey of safety and trustworthiness of deep neural networks: Verification, testing, adversarial attack and defence, and interpretability

X Huang, D Kroening, W Ruan, J Sharp, Y Sun… - Computer Science …, 2020 - Elsevier
In the past few years, significant progress has been made on deep neural networks (DNNs)
in achieving human-level performance on several long-standing tasks. With the broader …

Backdoor attacks and countermeasures on deep learning: A comprehensive review

Y Gao, BG Doan, Z Zhang, S Ma, J Zhang, A Fu… - arxiv preprint arxiv …, 2020 - arxiv.org
This work provides the community with a timely comprehensive review of backdoor attacks
and countermeasures on deep learning. According to the attacker's capability and affected …

Universal and transferable adversarial attacks on aligned language models

A Zou, Z Wang, N Carlini, M Nasr, JZ Kolter… - arxiv preprint arxiv …, 2023 - arxiv.org
Because" out-of-the-box" large language models are capable of generating a great deal of
objectionable content, recent work has focused on aligning these models in an attempt to …

Weight poisoning attacks on pre-trained models

K Kurita, P Michel, G Neubig - arxiv preprint arxiv:2004.06660, 2020 - arxiv.org
Recently, NLP has seen a surge in the usage of large pre-trained models. Users download
weights of models pre-trained on large datasets, then fine-tune the weights on a task of their …

Adversarial deepfakes: Evaluating vulnerability of deepfake detectors to adversarial examples

S Hussain, P Neekhara, M Jere… - Proceedings of the …, 2021 - openaccess.thecvf.com
Recent advances in video manipulation techniques have made the generation of fake
videos more accessible than ever before. Manipulated videos can fuel disinformation and …

Advpulse: Universal, synchronization-free, and targeted audio adversarial attacks via subsecond perturbations

Z Li, Y Wu, J Liu, Y Chen, B Yuan - Proceedings of the 2020 ACM …, 2020 - dl.acm.org
Existing efforts in audio adversarial attacks only focus on the scenarios where an adversary
has prior knowledge of the entire speech input so as to generate an adversarial example by …

A survey on universal adversarial attack

C Zhang, P Benz, C Lin, A Karjauv, J Wu… - arxiv preprint arxiv …, 2021 - arxiv.org
The intriguing phenomenon of adversarial examples has attracted significant attention in
machine learning and what might be more surprising to the community is the existence of …

A survey on voice assistant security: Attacks and countermeasures

C Yan, X Ji, K Wang, Q Jiang, Z **, W Xu - ACM Computing Surveys, 2022 - dl.acm.org
Voice assistants (VA) have become prevalent on a wide range of personal devices such as
smartphones and smart speakers. As companies build voice assistants with extra …

Adversarial threats to deepfake detection: A practical perspective

P Neekhara, B Dolhansky, J Bitton… - Proceedings of the …, 2021 - openaccess.thecvf.com
Facially manipulated images and videos or DeepFakes can be used maliciously to fuel
misinformation or defame individuals. Therefore, detecting DeepFakes is crucial to increase …

Data-free universal adversarial perturbation and black-box attack

C Zhang, P Benz, A Karjauv… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
Universal adversarial perturbation (UAP), ie a single perturbation to fool the network for most
images, is widely recognized as a more practical attack because the UAP can be generated …