- Academic Search

E Shayegani, MAA Mamun, Y Fu, P Zaree… - arxiv preprint arxiv …, 2023 - arxiv.org

Large Language Models (LLMs) are swiftly advancing in architecture and capability, and as
they integrate more deeply into complex systems, the urgency to scrutinize their security …

Lagre Referanse Sitert av 143 Beslektede artikler Alle 3 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] mdpi.com

Explainable ai: A review of machine learning interpretability methods

P Linardatos, V Papastefanopoulos, S Kotsiantis - Entropy, 2020 - mdpi.com

Recent advances in artificial intelligence (AI) have led to its widespread industrial adoption,
with machine learning systems demonstrating superhuman performance in a significant …

Lagre Referanse Sitert av 2675 Beslektede artikler Alle 12 versjoner Bufret

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

" do anything now": Characterizing and evaluating in-the-wild jailbreak prompts on large language models

X Shen, Z Chen, M Backes, Y Shen… - Proceedings of the 2024 on …, 2024 - dl.acm.org

The misuse of large language models (LLMs) has drawn significant attention from the
general public and LLM vendors. One particular type of adversarial prompt, known as …

Lagre Referanse Sitert av 450 Beslektede artikler Alle 5 versjoner

[Free GPT-4]
[DeepSeek]

[PDF] qub.ac.uk

[PDF][PDF] DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.

B Wang, W Chen, H Pei, C **e, M Kang, C Zhang, C Xu… - NeurIPS, 2023 - blogs.qub.ac.uk

Abstract Generative Pre-trained Transformer (GPT) models have exhibited exciting progress
in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the …

Lagre Referanse Sitert av 414 Beslektede artikler Alle 9 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense

K Krishna, Y Song, M Karpinska… - Advances in Neural …, 2023 - proceedings.neurips.cc

The rise in malicious usage of large language models, such as fake content creation and
academic plagiarism, has motivated the development of approaches that identify AI …

Lagre Referanse Sitert av 284 Beslektede artikler Alle 5 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Harmbench: A standardized evaluation framework for automated red teaming and robust refusal

M Mazeika, L Phan, X Yin, A Zou, Z Wang, N Mu… - arxiv preprint arxiv …, 2024 - arxiv.org

Automated red teaming holds substantial promise for uncovering and mitigating the risks
associated with the malicious use of large language models (LLMs), yet the field lacks a …

Lagre Referanse Sitert av 208 Beslektede artikler Alle 8 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey of data augmentation approaches for NLP

SY Feng, V Gangal, J Wei, S Chandar… - arxiv preprint arxiv …, 2021 - arxiv.org

Data augmentation has recently seen increased interest in NLP due to more work in low-
resource domains, new tasks, and the popularity of large-scale neural networks that require …

Lagre Referanse Sitert av 949 Beslektede artikler Alle 9 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

In chatgpt we trust? measuring and characterizing the reliability of chatgpt

X Shen, Z Chen, M Backes, Y Zhang - arxiv preprint arxiv:2304.08979, 2023 - arxiv.org

The way users acquire information is undergoing a paradigm shift with the advent of
ChatGPT. Unlike conventional search engines, ChatGPT retrieves knowledge from the …

Lagre Referanse Sitert av 140 Beslektede artikler Alle 2 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] springer.com

A survey of safety and trustworthiness of large language models through the lens of verification and validation

X Huang, W Ruan, W Huang, G **, Y Dong… - Artificial Intelligence …, 2024 - Springer

Large language models (LLMs) have exploded a new heatwave of AI for their ability to
engage end-users in human-level conversations with detailed and articulate answers across …

Lagre Referanse Sitert av 96 Beslektede artikler Alle 13 versjoner

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey of adversarial defenses and robustness in nlp

S Goyal, S Doddapaneni, MM Khapra… - ACM Computing …, 2023 - dl.acm.org

In the past few years, it has become increasingly evident that deep neural networks are not
resilient enough to withstand adversarial perturbations in input data, leaving them …

Lagre Referanse Sitert av 142 Beslektede artikler Alle 6 versjoner

Opprett varsel

Referanse

Avansert søk

Lagret i Mitt bibliotek

Adversarial example generation with syntactically controlled paraphrase networks

Survey of vulnerabilities in large language models revealed by adversarial attacks

Explainable ai: A review of machine learning interpretability methods

" do anything now": Characterizing and evaluating in-the-wild jailbreak prompts on large language models

[PDF][PDF] DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.

Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense

Harmbench: A standardized evaluation framework for automated red teaming and robust refusal

A survey of data augmentation approaches for NLP

In chatgpt we trust? measuring and characterizing the reliability of chatgpt

A survey of safety and trustworthiness of large language models through the lens of verification and validation

A survey of adversarial defenses and robustness in nlp