- Academic Search

R Guidotti - Data Mining and Knowledge Discovery, 2024 - Springer

Interpretable machine learning aims at unveiling the reasons behind predictions returned by
uninterpretable classifiers. One of the most valuable types of explanation consists of …

保存引用被引用数: 429 関連記事全 3 バージョン

Adversarial machine learning for network intrusion detection systems: A comprehensive survey

K He, DD Kim, MR Asghar - IEEE Communications Surveys & …, 2023 - ieeexplore.ieee.org

Network-based Intrusion Detection System (NIDS) forms the frontline defence against
network attacks that compromise the security of the data, systems, and networks. In recent …

保存引用被引用数: 201 関連記事全 4 バージョン

[Free GPT-4]

[PDF] arxiv.org

Universal and transferable adversarial attacks on aligned language models

A Zou, Z Wang, N Carlini, M Nasr, JZ Kolter… - arxiv preprint arxiv …, 2023 - arxiv.org

Because" out-of-the-box" large language models are capable of generating a great deal of
objectionable content, recent work has focused on aligning these models in an attempt to …

保存引用被引用数: 1080 関連記事全 8 バージョン HTMLバージョン

[Free GPT-4]

[PDF] usenix.org

Glaze: Protecting artists from style mimicry by {Text-to-Image} models

S Shan, J Cryan, E Wenger, H Zheng… - 32nd USENIX Security …, 2023 - usenix.org

Recent text-to-image diffusion models such as MidJourney and Stable Diffusion threaten to
displace many in the professional artist community. In particular, models can learn to mimic …

保存引用被引用数: 206 関連記事全 10 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Certifying llm safety against adversarial prompting

A Kumar, C Agarwal, S Srinivas, AJ Li, S Feizi… - arxiv preprint arxiv …, 2023 - arxiv.org

Large language models (LLMs) are vulnerable to adversarial attacks that add malicious
tokens to an input prompt to bypass the safety guardrails of an LLM and cause it to produce …

保存引用被引用数: 139 関連記事全 3 バージョン HTMLバージョン

[Free GPT-4]

[PDF] aaai.org

Visual adversarial examples jailbreak aligned large language models

X Qi, K Huang, A Panda, P Henderson… - Proceedings of the …, 2024 - ojs.aaai.org

Warning: this paper contains data, prompts, and model outputs that are offensive in nature.
Recently, there has been a surge of interest in integrating vision into Large Language …

保存引用被引用数: 152 関連記事全 5 バージョン HTMLバージョン

[Free GPT-4]

[PDF] neurips.cc

Improving robustness using generated data

S Gowal, SA Rebuffi, O Wiles… - Advances in …, 2021 - proceedings.neurips.cc

Recent work argues that robust training requires substantially larger datasets than those
required for standard classification. On CIFAR-10 and CIFAR-100, this translates into a …

保存引用被引用数: 324 関連記事全 7 バージョン HTMLバージョン

[Free GPT-4]

[PDF] neurips.cc

Data augmentation can improve robustness

SA Rebuffi, S Gowal, DA Calian… - Advances in …, 2021 - proceedings.neurips.cc

Adversarial training suffers from robust overfitting, a phenomenon where the robust test
accuracy starts to decrease during training. In this paper, we focus on reducing robust …

保存引用被引用数: 364 関連記事全 8 バージョン HTMLバージョン

[Free GPT-4]

[PDF] thecvf.com

Lira: Learnable, imperceptible and robust backdoor attacks

K Doan, Y Lao, W Zhao, P Li - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com

Recently, machine learning models have demonstrated to be vulnerable to backdoor
attacks, primarily due to the lack of transparency in black-box models such as deep neural …

保存引用被引用数: 278 関連記事全 5 バージョン HTMLバージョン

[Free GPT-4]

[PDF] mlr.press

Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks

F Croce, M Hein - International conference on machine …, 2020 - proceedings.mlr.press

The field of defense strategies against adversarial attacks has significantly grown over the
last years, but progress is hampered as the evaluation of adversarial defenses is often …

保存引用被引用数: 2073 関連記事全 8 バージョン HTMLバージョン

アラートを作成

引用

検索オプション

マイライブラリに保存しました

Adversarial examples are not easily detected: Bypassing ten detection methods

Counterfactual explanations and how to find them: literature review and benchmarking

Adversarial machine learning for network intrusion detection systems: A comprehensive survey

Universal and transferable adversarial attacks on aligned language models

Glaze: Protecting artists from style mimicry by {Text-to-Image} models

Certifying llm safety against adversarial prompting

Visual adversarial examples jailbreak aligned large language models

Improving robustness using generated data

Data augmentation can improve robustness

Lira: Learnable, imperceptible and robust backdoor attacks

Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks