- Academic Search

J Leike, D Krueger, T Everitt, M Martic, V Maini… - arxiv preprint arxiv …, 2018 - arxiv.org

One obstacle to applying reinforcement learning algorithms to real-world problems is the
lack of suitable reward functions. Designing such reward functions is difficult in part because …

Zapisz Cytuj Cytowane przez 363 Powiązane artykuły Wszystkie wersje 6 Wersja HTML

[Free GPT-4]

[PDF] acm.org

Taxonomy of machine learning safety: A survey and primer

S Mohseni, H Wang, C **ao, Z Yu, Z Wang… - ACM Computing …, 2022 - dl.acm.org

The open-world deployment of Machine Learning (ML) algorithms in safety-critical
applications such as autonomous vehicles needs to address a variety of ML vulnerabilities …

Zapisz Cytuj Cytowane przez 52 Powiązane artykuły Wszystkie wersje 4

[Free GPT-4]

[PDF] mlr.press

Certified adversarial robustness via randomized smoothing

J Cohen, E Rosenfeld, Z Kolter - international conference on …, 2019 - proceedings.mlr.press

We show how to turn any classifier that classifies well under Gaussian noise into a new
classifier that is certifiably robust to adversarial perturbations under the L2 norm. While this" …

Zapisz Cytuj Cytowane przez 2320 Powiązane artykuły Wszystkie wersje 6 Wersja HTML

[Free GPT-4]

[PDF] arxiv.org

Adversarial glue: A multi-task benchmark for robustness evaluation of language models

B Wang, C Xu, S Wang, Z Gan, Y Cheng, J Gao… - arxiv preprint arxiv …, 2021 - arxiv.org

Large-scale pre-trained language models have achieved tremendous success across a
wide range of natural language understanding (NLU) tasks, even surpassing human …

Zapisz Cytuj Cytowane przez 215 Powiązane artykuły Wszystkie wersje 6 Wersja HTML

[Free GPT-4]

[PDF] neurips.cc

Provably robust deep learning via adversarially trained smoothed classifiers

H Salman, J Li, I Razenshteyn… - Advances in neural …, 2019 - proceedings.neurips.cc

Recent works have shown the effectiveness of randomized smoothing as a scalable
technique for building neural network-based classifiers that are provably robust to $\ell_2 …

Zapisz Cytuj Cytowane przez 619 Powiązane artykuły Wszystkie wersje 9 Wersja HTML

[Free GPT-4]

[PDF] mit.edu

Robustness may be at odds with accuracy

D Tsipras, S Santurkar, L Engstrom, A Turner… - arxiv preprint arxiv …, 2018 - arxiv.org

We show that there may exist an inherent tension between the goal of adversarial
robustness and that of standard generalization. Specifically, training robust models may not …

Zapisz Cytuj Cytowane przez 2009 Powiązane artykuły Wszystkie wersje 7 Wersja HTML

[Free GPT-4]

[PDF] arxiv.org

Certified robustness to adversarial examples with differential privacy

M Lecuyer, V Atlidakis, R Geambasu… - … IEEE symposium on …, 2019 - ieeexplore.ieee.org

Adversarial examples that fool machine learning models, particularly deep neural networks,
have been a topic of intense research interest, with attacks and defenses being developed …

Zapisz Cytuj Cytowane przez 1149 Powiązane artykuły Wszystkie wersje 19

[Free GPT-4]

[PDF] neurips.cc

When does contrastive learning preserve adversarial robustness from pretraining to finetuning?

L Fan, S Liu, PY Chen, G Zhang… - Advances in neural …, 2021 - proceedings.neurips.cc

Contrastive learning (CL) can learn generalizable feature representations and achieve state-
of-the-art performance of downstream tasks by finetuning a linear classifier on top of it …

Zapisz Cytuj Cytowane przez 128 Powiązane artykuły Wszystkie wersje 8 Wersja HTML

On the effectiveness of interval bound propagation for training verifiably robust models

S Gowal, K Dvijotham, R Stanforth, R Bunel… - arxiv preprint arxiv …, 2018 - arxiv.org

Recent work has shown that it is possible to train deep neural networks that are provably
robust to norm-bounded adversarial perturbations. Most of these methods are based on …

Zapisz Cytuj Cytowane przez 560 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]

[PDF] neurips.cc

Semidefinite relaxations for certifying robustness to adversarial examples

A Raghunathan, J Steinhardt… - Advances in neural …, 2018 - proceedings.neurips.cc

Despite their impressive performance on diverse tasks, neural networks fail catastrophically
in the presence of adversarial inputs—imperceptibly but adversarially perturbed versions of …

Zapisz Cytuj Cytowane przez 514 Powiązane artykuły Wszystkie wersje 6 Wersja HTML

Utwórz alert

Cytuj

Szukanie zaawansowane

Zapisano w Mojej bibliotece

Training verified learners with learned verifiers

Scalable agent alignment via reward modeling: a research direction

Taxonomy of machine learning safety: A survey and primer

Certified adversarial robustness via randomized smoothing

Adversarial glue: A multi-task benchmark for robustness evaluation of language models

Provably robust deep learning via adversarially trained smoothed classifiers

Robustness may be at odds with accuracy

Certified robustness to adversarial examples with differential privacy

When does contrastive learning preserve adversarial robustness from pretraining to finetuning?

On the effectiveness of interval bound propagation for training verifiably robust models

Semidefinite relaxations for certifying robustness to adversarial examples