Google Académico

BC Das, MH Amini, Y Wu - ACM Computing Surveys, 2025 - dl.acm.org

Large language models (LLMs) have demonstrated extraordinary capabilities and
contributed to multiple fields, such as generating and summarizing text, language …

Guardar Citar Citado por 689 Artigos relacionados Todas as 11 versões

[Free GPT-4]
[DeepSeek]

[HTML] sciencedirect.com

[HTML][HTML] A survey on large language model (llm) security and privacy: The good, the bad, and the ugly

Y Yao, J Duan, K Xu, Y Cai, Z Sun, Y Zhang - High-Confidence Computing, 2024 - Elsevier

Abstract Large Language Models (LLMs), such as ChatGPT and Bard, have revolutionized
natural language understanding and generation. They possess deep language …

Guardar Citar Citado por 582 Artigos relacionados Todas as 14 versões

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Jailbroken: How does llm safety training fail?

A Wei, N Haghtalab… - Advances in Neural …, 2023 - proceedings.neurips.cc

Large language models trained for safety and harmlessness remain susceptible to
adversarial misuse, as evidenced by the prevalence of “jailbreak” attacks on early releases …

Guardar Citar Citado por 811 Artigos relacionados Todas as 8 versões Ver em HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

" do anything now": Characterizing and evaluating in-the-wild jailbreak prompts on large language models

X Shen, Z Chen, M Backes, Y Shen… - Proceedings of the 2024 on …, 2024 - dl.acm.org

The misuse of large language models (LLMs) has drawn significant attention from the
general public and LLM vendors. One particular type of adversarial prompt, known as …

Guardar Citar Citado por 450 Artigos relacionados Todas as 5 versões

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Challenges and applications of large language models

J Kaddour, J Harris, M Mozes, H Bradley… - arxiv preprint arxiv …, 2023 - arxiv.org

Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …

Guardar Citar Citado por 493 Artigos relacionados Todas as 4 versões Ver em HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Large language models cannot self-correct reasoning yet

J Huang, X Chen, S Mishra, HS Zheng, AW Yu… - arxiv preprint arxiv …, 2023 - arxiv.org

Large Language Models (LLMs) have emerged as a groundbreaking technology with their
unparalleled text generation capabilities across various applications. Nevertheless …

Guardar Citar Citado por 348 Artigos relacionados Todas as 4 versões Ver em HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Open problems and fundamental limitations of reinforcement learning from human feedback

S Casper, X Davies, C Shi, TK Gilbert… - arxiv preprint arxiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state …

Guardar Citar Citado por 473 Artigos relacionados Todas as 7 versões Ver em HTML

[Free GPT-4]
[DeepSeek]

[PDF] qub.ac.uk

[PDF][PDF] DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.

B Wang, W Chen, H Pei, C **e, M Kang, C Zhang, C Xu… - NeurIPS, 2023 - blogs.qub.ac.uk

Abstract Generative Pre-trained Transformer (GPT) models have exhibited exciting progress
in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the …

Guardar Citar Citado por 414 Artigos relacionados Todas as 9 versões Ver em HTML

[Free GPT-4]
[DeepSeek]

[PDF] github.io

Defending chatgpt against jailbreak attack via self-reminders

Y **e, J Yi, J Shao, J Curl, L Lyu, Q Chen… - Nature Machine …, 2023 - nature.com

ChatGPT is a societally impactful artificial intelligence tool with millions of users and
integration into products such as Bing. However, the emergence of jailbreak attacks notably …

Guardar Citar Citado por 176 Artigos relacionados Todas as 3 versões

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Tree of attacks: Jailbreaking black-box llms automatically

A Mehrotra, M Zampetakis… - Advances in …, 2025 - proceedings.neurips.cc

Abstract While Large Language Models (LLMs) display versatile functionality, they continue
to generate harmful, biased, and toxic content, as demonstrated by the prevalence of human …

Guardar Citar Citado por 192 Artigos relacionados Todas as 6 versões Ver em HTML

Criar alerta

Citar

Pesquisa avançada

Guardado em A minha biblioteca

Multi-step jailbreaking privacy attacks on chatgpt

Security and privacy challenges of large language models: A survey

[HTML][HTML] A survey on large language model (llm) security and privacy: The good, the bad, and the ugly

Jailbroken: How does llm safety training fail?

" do anything now": Characterizing and evaluating in-the-wild jailbreak prompts on large language models

Challenges and applications of large language models

Large language models cannot self-correct reasoning yet

Open problems and fundamental limitations of reinforcement learning from human feedback

[PDF][PDF] DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.

Defending chatgpt against jailbreak attack via self-reminders

Tree of attacks: Jailbreaking black-box llms automatically