Google Acadèmic

C Chen, K Shu - AI Magazine, 2024 - Wiley Online Library

Misinformation such as fake news and rumors is a serious threat for information ecosystems
and public trust. The emergence of large language models (LLMs) has great potential to …

Desa Cita Citat per 130 Articles relacionats Totes les 4 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Large language model supply chain: A research agenda

S Wang, Y Zhao, X Hou, H Wang - ACM Transactions on Software …, 2024 - dl.acm.org

The rapid advancement of large language models (LLMs) has revolutionized artificial
intelligence, introducing unprecedented capabilities in natural language processing and …

Desa Cita Citat per 9 Articles relacionats Totes les 3 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] jair.org Full View

Against The Achilles' Heel: A Survey on Red Teaming for Generative Models

L Lin, H Mu, Z Zhai, M Wang, Y Wang, R Wang… - Journal of Artificial …, 2025 - jair.org

Generative models are rapidly gaining popularity and being integrated into everyday
applications, raising concerns over their safe use as various vulnerabilities are exposed. In …

Desa Cita Citat per 13 Articles relacionats Totes les 2 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Weak-to-strong jailbreaking on large language models

X Zhao, X Yang, T Pang, C Du, L Li, YX Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

Although significant efforts have been dedicated to aligning large language models (LLMs),
red-teaming reports suggest that these carefully aligned LLMs could still be jailbroken …

Desa Cita Citat per 54 Articles relacionats Totes les 2 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Escalation risks from language models in military and diplomatic decision-making

JP Rivera, G Mukobi, A Reuel, M Lamparth… - The 2024 ACM …, 2024 - dl.acm.org

Governments are increasingly considering integrating autonomous AI agents in high-stakes
military and foreign-policy decision-making, especially with the emergence of advanced …

Desa Cita Citat per 32 Articles relacionats Totes les 6 versions Free GPT-4 DeepSeek Cerca de biblioteques

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Mission impossible: A statistical perspective on jailbreaking llms

J Su, J Kempe, K Ullrich - Advances in Neural Information …, 2025 - proceedings.neurips.cc

Large language models (LLMs) are trained on a deluge of text data with limited quality
control. As a result, LLMs can exhibit unintended or even harmful behaviours, such as …

Desa Cita Citat per 4 Articles relacionats Totes les 6 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Permute-and-Flip: An optimally robust and watermarkable decoder for LLMs

X Zhao, L Li, YX Wang - arxiv preprint arxiv:2402.05864, 2024 - arxiv.org

In this paper, we propose a new decoding method called Permute-and-Flip (PF) decoder. It
enjoys robustness properties similar to the standard sampling decoder, but is provably up to …

Desa Cita Citat per 12 Articles relacionats Totes les 2 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Codechameleon: Personalized encryption framework for jailbreaking large language models

H Lv, X Wang, Y Zhang, C Huang, S Dou, J Ye… - arxiv preprint arxiv …, 2024 - arxiv.org

Adversarial misuse, particularly throughjailbreaking'that circumvents a model's safety and
ethical protocols, poses a significant challenge for Large Language Models (LLMs). This …

Desa Cita Citat per 32 Articles relacionats Totes les 2 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Rapid optimization for jailbreaking llms via subconscious exploitation and echopraxia

G Shen, S Cheng, K Zhang, G Tao, S An, L Yan… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Language Models (LLMs) have become prevalent across diverse sectors,
transforming human life with their extraordinary reasoning and comprehension abilities. As …

Desa Cita Citat per 12 Articles relacionats Totes les 2 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI

A Rawat, S Schoepf, G Zizzo, G Cornacchia… - arxiv preprint arxiv …, 2024 - arxiv.org

As generative AI, particularly large language models (LLMs), become increasingly
integrated into production applications, new attack surfaces and vulnerabilities emerge and …

Desa Cita Citat per 3 Articles relacionats Totes les 4 versions Free GPT-4 DeepSeek Versió HTML

Crea una alerta

Cita

Cerca avançada

S'ha desat a La meva biblioteca

On the Safety of Open-Sourced Large Language Models: Does Alignment Really Prevent Them From...

Combating misinformation in the age of llms: Opportunities and challenges

Large language model supply chain: A research agenda

Against The Achilles' Heel: A Survey on Red Teaming for Generative Models

Weak-to-strong jailbreaking on large language models

Escalation risks from language models in military and diplomatic decision-making

Mission impossible: A statistical perspective on jailbreaking llms

Permute-and-Flip: An optimally robust and watermarkable decoder for LLMs

Codechameleon: Personalized encryption framework for jailbreaking large language models

Rapid optimization for jailbreaking llms via subconscious exploitation and echopraxia

Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI