Google Académico

C Chen, K Shu - AI Magazine, 2024 - Wiley Online Library

Misinformation such as fake news and rumors is a serious threat for information ecosystems
and public trust. The emergence of large language models (LLMs) has great potential to …

Guardar Citar Citado por 125 Artículos relacionados Las 4 versiones

[Free GPT-4]

[HTML] sciencedirect.com

[HTML][HTML] A survey on large language model (llm) security and privacy: The good, the bad, and the ugly

Y Yao, J Duan, K Xu, Y Cai, Z Sun, Y Zhang - High-Confidence Computing, 2024 - Elsevier

Abstract Large Language Models (LLMs), such as ChatGPT and Bard, have revolutionized
natural language understanding and generation. They possess deep language …

Guardar Citar Citado por 523 Artículos relacionados Las 11 versiones

[Free GPT-4]

[PDF] arxiv.org

Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - arxiv preprint arxiv …, 2024 - arxiv.org

This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

Guardar Citar Citado por 118 Artículos relacionados Las 3 versiones Versión en HTML

Mm-safetybench: A benchmark for safety evaluation of multimodal large language models

X Liu, Y Zhu, J Gu, Y Lan, C Yang, Y Qiao - European Conference on …, 2024 - Springer

The security concerns surrounding Large Language Models (LLMs) have been extensively
explored, yet the safety of Multimodal Large Language Models (MLLMs) remains …

Guardar Citar Citado por 43 Artículos relacionados Las 3 versiones

[Free GPT-4]

[PDF] arxiv.org

Images are achilles' heel of alignment: Exploiting visual vulnerabilities for jailbreaking multimodal large language models

Y Li, H Guo, K Zhou, WX Zhao, JR Wen - European Conference on …, 2024 - Springer

In this paper, we study the harmlessness alignment problem of multimodal large language
models (MLLMs). We conduct a systematic empirical analysis of the harmlessness …

Guardar Citar Citado por 43 Artículos relacionados Las 2 versiones

[Free GPT-4]

[PDF] arxiv.org

How johnny can persuade llms to jailbreak them: Rethinking persuasion to challenge ai safety by humanizing llms

Y Zeng, H Lin, J Zhang, D Yang, R Jia… - arxiv preprint arxiv …, 2024 - arxiv.org

Most traditional AI safety research has approached AI models as machines and centered on
algorithm-focused attacks developed by security experts. As large language models (LLMs) …

Guardar Citar Citado por 184 Artículos relacionados Las 3 versiones Versión en HTML

[Free GPT-4]

[PDF] aaai.org

Red-Teaming for generative AI: Silver bullet or security theater?

M Feffer, A Sinha, WH Deng, ZC Lipton… - Proceedings of the AAAI …, 2024 - ojs.aaai.org

In response to rising concerns surrounding the safety, security, and trustworthiness of
Generative AI (GenAI) models, practitioners and regulators alike have pointed to AI red …

Guardar Citar Citado por 41 Artículos relacionados Las 2 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

Jatmo: Prompt injection defense by task-specific finetuning

J Piet, M Alrashed, C Sitawarin, S Chen, Z Wei… - … on Research in …, 2024 - Springer

Abstract Large Language Models (LLMs) are attracting significant research attention due to
their instruction-following abilities, allowing users and developers to leverage LLMs for a …

Guardar Citar Citado por 46 Artículos relacionados Las 2 versiones

[Free GPT-4]

[PDF] arxiv.org

Privacy in large language models: Attacks, defenses and future directions

H Li, Y Chen, J Luo, J Wang, H Peng, Y Kang… - arxiv preprint arxiv …, 2023 - arxiv.org

The advancement of large language models (LLMs) has significantly enhanced the ability to
effectively tackle various downstream NLP tasks and unify these tasks into generative …

Guardar Citar Citado por 54 Artículos relacionados Las 2 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

Mllm-protector: Ensuring mllm's safety without hurting performance

R Pi, T Han, J Zhang, Y **e, R Pan, Q Lian… - arxiv preprint arxiv …, 2024 - arxiv.org

The deployment of multimodal large language models (MLLMs) has brought forth a unique
vulnerability: susceptibility to malicious attacks through visual inputs. This paper investigates …

Guardar Citar Citado por 48 Artículos relacionados Las 3 versiones Versión en HTML

Crear alerta

Citar

Búsqueda avanzada

Guardado en Mi biblioteca

Jailbreak and guard aligned language models with only few in-context demonstrations

Combating misinformation in the age of llms: Opportunities and challenges

[HTML][HTML] A survey on large language model (llm) security and privacy: The good, the bad, and the ugly

Foundational challenges in assuring alignment and safety of large language models

Mm-safetybench: A benchmark for safety evaluation of multimodal large language models

Images are achilles' heel of alignment: Exploiting visual vulnerabilities for jailbreaking multimodal large language models

How johnny can persuade llms to jailbreak them: Rethinking persuasion to challenge ai safety by humanizing llms

Red-Teaming for generative AI: Silver bullet or security theater?

Jatmo: Prompt injection defense by task-specific finetuning

Privacy in large language models: Attacks, defenses and future directions

Mllm-protector: Ensuring mllm's safety without hurting performance