Google Tudós

J Schneider - Artificial Intelligence Review, 2024 - Springer

Generative AI (GenAI) represents a shift from AI's ability to “recognize” to its ability to
“generate” solutions for a wide range of tasks. As generated solutions and applications grow …

Mentés Hivatkozás Idézetek száma: 24 Kapcsolódó cikkek Mind a(z) 5 változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Identifying and mitigating vulnerabilities in llm-integrated applications

F Jiang - 2024 - search.proquest.com

Large language models (LLMs) are increasingly deployed as the backend for various
applications, including code completion tools and AI-powered search engines. Unlike …

Mentés Hivatkozás Idézetek száma: 1478 Kapcsolódó cikkek Mind a(z) 8 változat

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Explainability for large language models: A survey

H Zhao, H Chen, F Yang, N Liu, H Deng, H Cai… - ACM Transactions on …, 2024 - dl.acm.org

Large language models (LLMs) have demonstrated impressive capabilities in natural
language processing. However, their internal mechanisms are still unclear and this lack of …

Mentés Hivatkozás Idézetek száma: 445 Kapcsolódó cikkek Mind a(z) 7 változat

[Free GPT-4]
[DeepSeek]

[PDF] qub.ac.uk

[PDF][PDF] DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.

B Wang, W Chen, H Pei, C **e, M Kang, C Zhang, C Xu… - NeurIPS, 2023 - blogs.qub.ac.uk

Abstract Generative Pre-trained Transformer (GPT) models have exhibited exciting progress
in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the …

Mentés Hivatkozás Idézetek száma: 410 Kapcsolódó cikkek Mind a(z) 9 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Robust prompt optimization for defending language models against jailbreaking attacks

A Zhou, B Li, H Wang - arxiv preprint arxiv:2401.17263, 2024 - arxiv.org

Despite advances in AI alignment, large language models (LLMs) remain vulnerable to
adversarial attacks or jailbreaking, in which adversaries can modify prompts to induce …

Mentés Hivatkozás Idézetek száma: 60 Kapcsolódó cikkek Mind a(z) 5 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Exploring the limits of domain-adaptive training for detoxifying large-scale language models

B Wang, W **, C **ao, P Xu… - Advances in …, 2022 - proceedings.neurips.cc

Pre-trained language models (LMs) are shown to easily generate toxic language. In this
work, we systematically explore domain-adaptive training to reduce the toxicity of language …

Mentés Hivatkozás Idézetek száma: 70 Kapcsolódó cikkek Mind a(z) 7 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

An llm can fool itself: A prompt-based adversarial attack

X Xu, K Kong, N Liu, L Cui, D Wang, J Zhang… - arxiv preprint arxiv …, 2023 - arxiv.org

The wide-ranging applications of large language models (LLMs), especially in safety-critical
domains, necessitate the proper evaluation of the LLM's adversarial robustness. This paper …

Mentés Hivatkozás Idézetek száma: 26 Kapcsolódó cikkek Mind a(z) 6 változat HTML-változat

Exposing the Achilles' heel of textual hate speech classifiers using indistinguishable adversarial examples

S Aggarwal, DK Vishwakarma - Expert Systems with Applications, 2024 - Elsevier

The accessibility of online hate speech has increased significantly, making it crucial for
social-media companies to prioritize efforts to curb its spread. Although deep learning …

Mentés Hivatkozás Idézetek száma: 8 Kapcsolódó cikkek Mind a(z) 2 változat

Transferable adversarial distribution learning: Query-efficient adversarial attack against large language models

H Dong, J Dong, S Wan, S Yuan, Z Guan - Computers & Security, 2023 - Elsevier

It is a challenging task to fool a text classifier based on deep neural networks under the
black-box setting where the target model can only be queried. Among the existing black-box …

Mentés Hivatkozás Idézetek száma: 20 Kapcsolódó cikkek Mind a(z) 2 változat

Adversarial attacks and defenses for large language models (LLMs): methods, frameworks & challenges

P Kumar - International Journal of Multimedia Information …, 2024 - Springer

Large language models (LLMs) have exhibited remarkable efficacy and proficiency in a
wide array of NLP endeavors. Nevertheless, concerns are growing rapidly regarding the …

Mentés Hivatkozás Idézetek száma: 10 Kapcsolódó cikkek

Értesítés létrehozása

Hivatkozás

Speciális keresés

Mentve a Saját könyvtárba

SemAttack: Natural textual attacks via different semantic spaces

Explainable generative ai (genxai): A survey, conceptualization, and research agenda

Identifying and mitigating vulnerabilities in llm-integrated applications

Explainability for large language models: A survey

[PDF][PDF] DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.

Robust prompt optimization for defending language models against jailbreaking attacks

Exploring the limits of domain-adaptive training for detoxifying large-scale language models

An llm can fool itself: A prompt-based adversarial attack

Exposing the Achilles' heel of textual hate speech classifiers using indistinguishable adversarial examples

Transferable adversarial distribution learning: Query-efficient adversarial attack against large language models

Adversarial attacks and defenses for large language models (LLMs): methods, frameworks & challenges