- Academic Search

BC Das, MH Amini, Y Wu - ACM Computing Surveys, 2024 - dl.acm.org

Large language models (LLMs) have demonstrated extraordinary capabilities and
contributed to multiple fields, such as generating and summarizing text, language …

Speichern Zitieren Zitiert von: 91 Ähnliche Artikel Alle 3 Versionen

[Free GPT-4]

[PDF] arxiv.org

The llama 3 herd of models

A Dubey, A Jauhri, A Pandey, A Kadian… - arxiv preprint arxiv …, 2024 - arxiv.org

Modern artificial intelligence (AI) systems are powered by foundation models. This paper
presents a new set of foundation models, called Llama 3. It is a herd of language models …

Speichern Zitieren Zitiert von: 2204 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Trustllm: Trustworthiness in large language models

Y Huang, L Sun, H Wang, S Wu, Q Zhang, Y Li… - arxiv preprint arxiv …, 2024 - arxiv.org

Large language models (LLMs), exemplified by ChatGPT, have gained considerable
attention for their excellent natural language processing capabilities. Nonetheless, these …

Speichern Zitieren Zitiert von: 243 Ähnliche Artikel Alle 4 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Jailbreak and guard aligned language models with only few in-context demonstrations

Z Wei, Y Wang, A Li, Y Mo, Y Wang - arxiv preprint arxiv:2310.06387, 2023 - arxiv.org

Large Language Models (LLMs) have shown remarkable success in various tasks, yet their
safety and the risk of generating harmful content remain pressing concerns. In this paper, we …

Speichern Zitieren Zitiert von: 174 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[HTML] mlr.press

[HTML][HTML] Position: TrustLLM: Trustworthiness in large language models

Y Huang, L Sun, H Wang, S Wu… - International …, 2024 - proceedings.mlr.press

Large language models (LLMs) have gained considerable attention for their excellent
natural language processing capabilities. Nonetheless, these LLMs present many …

Speichern Zitieren Zitiert von: 39 Ähnliche Artikel Im Cache

[Free GPT-4]

[PDF] arxiv.org

Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - arxiv preprint arxiv …, 2024 - arxiv.org

This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

Speichern Zitieren Zitiert von: 116 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Smoothllm: Defending large language models against jailbreaking attacks

A Robey, E Wong, H Hassani, GJ Pappas - arxiv preprint arxiv …, 2023 - arxiv.org

Despite efforts to align large language models (LLMs) with human values, widely-used
LLMs such as GPT, Llama, Claude, and PaLM are susceptible to jailbreaking attacks …

Speichern Zitieren Zitiert von: 207 Ähnliche Artikel Alle 4 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Images are achilles' heel of alignment: Exploiting visual vulnerabilities for jailbreaking multimodal large language models

Y Li, H Guo, K Zhou, WX Zhao, JR Wen - European Conference on …, 2024 - Springer

In this paper, we study the harmlessness alignment problem of multimodal large language
models (MLLMs). We conduct a systematic empirical analysis of the harmlessness …

Speichern Zitieren Zitiert von: 42 Ähnliche Artikel Alle 2 Versionen

[Free GPT-4]

[PDF] arxiv.org

On protecting the data privacy of large language models (llms): A survey

B Yan, K Li, M Xu, Y Dong, Y Zhang, Z Ren… - arxiv preprint arxiv …, 2024 - arxiv.org

Large language models (LLMs) are complex artificial intelligence systems capable of
understanding, generating and translating human language. They learn language patterns …

Speichern Zitieren Zitiert von: 72 Ähnliche Artikel Alle 4 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Refusal in language models is mediated by a single direction

A Arditi, O Obeso, A Syed, D Paleka… - arxiv preprint arxiv …, 2024 - arxiv.org

Conversational large language models are fine-tuned for both instruction-following and
safety, resulting in models that obey benign requests but refuse harmful ones. While this …

Speichern Zitieren Zitiert von: 51 Ähnliche Artikel HTML-Version

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

Jailbreaking black box large language models in twenty queries

Security and privacy challenges of large language models: A survey

The llama 3 herd of models

Trustllm: Trustworthiness in large language models

Jailbreak and guard aligned language models with only few in-context demonstrations

[HTML][HTML] Position: TrustLLM: Trustworthiness in large language models

Foundational challenges in assuring alignment and safety of large language models

Smoothllm: Defending large language models against jailbreaking attacks

Images are achilles' heel of alignment: Exploiting visual vulnerabilities for jailbreaking multimodal large language models

On protecting the data privacy of large language models (llms): A survey

Refusal in language models is mediated by a single direction