Google Академик

E Shayegani, MAA Mamun, Y Fu, P Zaree… - arxiv preprint arxiv …, 2023 - arxiv.org

Large Language Models (LLMs) are swiftly advancing in architecture and capability, and as
they integrate more deeply into complex systems, the urgency to scrutinize their security …

Сачувај Цитирај 143 пута наведен Сродни чланци Све верзије (3) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey of adversarial defenses and robustness in nlp

S Goyal, S Doddapaneni, MM Khapra… - ACM Computing …, 2023 - dl.acm.org

In the past few years, it has become increasingly evident that deep neural networks are not
resilient enough to withstand adversarial perturbations in input data, leaving them …

Сачувај Цитирај 142 пута наведен Сродни чланци Све верзије (6)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Trustworthy llms: a survey and guideline for evaluating large language models' alignment

Y Liu, Y Yao, JF Ton, X Zhang, R Guo, H Cheng… - arxiv preprint arxiv …, 2023 - arxiv.org

Ensuring alignment, which refers to making models behave in accordance with human
intentions [1, 2], has become a critical task before deploying large language models (LLMs) …

Сачувај Цитирај 289 пута наведен Сродни чланци Све верзије (3) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Easily accessible text-to-image generation amplifies demographic stereotypes at large scale

F Bianchi, P Kalluri, E Durmus, F Ladhak… - Proceedings of the …, 2023 - dl.acm.org

Machine learning models that convert user-written text descriptions into images are now
widely available online and used by millions of users to generate millions of images a day …

Сачувај Цитирај 300 пута наведен Сродни чланци Све верзије (8)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Red teaming language models with language models

E Perez, S Huang, F Song, T Cai, R Ring… - arxiv preprint arxiv …, 2022 - arxiv.org

Language Models (LMs) often cannot be deployed because of their potential to harm users
in hard-to-predict ways. Prior work identifies harmful behaviors before deployment by using …

Сачувај Цитирај 622 пута наведен Сродни чланци Све верзије (5) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] springer.com

A survey of safety and trustworthiness of large language models through the lens of verification and validation

X Huang, W Ruan, W Huang, G **, Y Dong… - Artificial Intelligence …, 2024 - Springer

Large language models (LLMs) have exploded a new heatwave of AI for their ability to
engage end-users in human-level conversations with detailed and articulate answers across …

Сачувај Цитирај 96 пута наведен Сродни чланци Све верзије (13)

[Free GPT-4]
[DeepSeek]

[PDF] sagepub.com

Algorithmic content moderation: Technical and political challenges in the automation of platform governance

R Gorwa, R Binns, C Katzenbach - Big Data & Society, 2020 - journals.sagepub.com

As government pressure on major technology companies builds, both firms and legislators
are searching for technical solutions to difficult platform governance puzzles such as hate …

Сачувај Цитирај 860 пута наведен Сродни чланци Све верзије (10)

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Quark: Controllable text generation with reinforced unlearning

X Lu, S Welleck, J Hessel, L Jiang… - Advances in neural …, 2022 - proceedings.neurips.cc

Large-scale language models often learn behaviors that are misaligned with user
expectations. Generated text may contain offensive or toxic language, contain significant …

Сачувај Цитирај 191 пута наведен Сродни чланци Све верзије (5) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Weight poisoning attacks on pre-trained models

K Kurita, P Michel, G Neubig - arxiv preprint arxiv:2004.06660, 2020 - arxiv.org

Recently, NLP has seen a surge in the usage of large pre-trained models. Users download
weights of models pre-trained on large datasets, then fine-tune the weights on a task of their …

Сачувај Цитирај 454 пута наведен Сродни чланци Све верзије (5) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Mind the style of text! adversarial and backdoor attacks based on text style transfer

F Qi, Y Chen, X Zhang, M Li, Z Liu, M Sun - arxiv preprint arxiv …, 2021 - arxiv.org

Adversarial attacks and backdoor attacks are two common security threats that hang over
deep learning. Both of them harness task-irrelevant features of data in their implementation …

Сачувај Цитирај 175 пута наведен Сродни чланци Све верзије (4) HTML верзија

Направи обавештење

Цитирај

Напредна претрага

Сачувано у мојој библиотеци

Deceiving google's perspective api built for detecting toxic comments

Survey of vulnerabilities in large language models revealed by adversarial attacks

A survey of adversarial defenses and robustness in nlp

Trustworthy llms: a survey and guideline for evaluating large language models' alignment

Easily accessible text-to-image generation amplifies demographic stereotypes at large scale

Red teaming language models with language models

A survey of safety and trustworthiness of large language models through the lens of verification and validation

Algorithmic content moderation: Technical and political challenges in the automation of platform governance

Quark: Controllable text generation with reinforced unlearning

Weight poisoning attacks on pre-trained models

Mind the style of text! adversarial and backdoor attacks based on text style transfer