- Academic Search

A Wei, N Haghtalab… - Advances in Neural …, 2023‏ - proceedings.neurips.cc‏

Large language models trained for safety and harmlessness remain susceptible to
adversarial misuse, as evidenced by the prevalence of “jailbreak” attacks on early releases …‏

שמור צטט צוטט על ידי 827 מאמרים בנושא זה כל 8 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] qub.ac.uk

[PDF][PDF] DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.‏

B Wang, W Chen, H Pei, C **e, M Kang, C Zhang, C Xu… - NeurIPS, 2023‏ - blogs.qub.ac.uk‏

Abstract Generative Pre-trained Transformer (GPT) models have exhibited exciting progress
in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the …‏

שמור צטט צוטט על ידי 422 מאמרים בנושא זה כל 9 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] utk.edu

[PDF][PDF] Trustllm: Trustworthiness in large language models‏

L Sun, Y Huang, H Wang, S Wu, Q Zhang… - arxiv preprint arxiv …, 2024‏ - mosis.eecs.utk.edu‏

Large language models (LLMs), exemplified by ChatGPT, have gained considerable
attention for their excellent natural language processing capabilities. Nonetheless, these …‏

שמור צטט צוטט על ידי 263 מאמרים בנושא זה כל 6 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Pretraining language models with human preferences‏

T Korbak, K Shi, A Chen, RV Bhalerao… - International …, 2023‏ - proceedings.mlr.press‏

Abstract Language models (LMs) are pretrained to imitate text from large and diverse
datasets that contain content that would violate human preferences if generated by an LM …‏

שמור צטט צוטט על ידי 204 מאמרים בנושא זה כל 11 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Gpt-4 is too smart to be safe: Stealthy chat with llms via cipher‏

Y Yuan, W Jiao, W Wang, J Huang, P He, S Shi… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

Safety lies at the core of the development of Large Language Models (LLMs). There is
ample work on aligning LLMs with human ethics and preferences, including data filtering in …‏

שמור צטט צוטט על ידי 199 מאמרים בנושא זה כל 4 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[HTML] mdpi.com

[HTML][HTML] Contemporary approaches in evolving language models‏

D Oralbekova, O Mamyrbayev, M Othman… - Applied Sciences, 2023‏ - mdpi.com‏

This article provides a comprehensive survey of contemporary language modeling
approaches within the realm of natural language processing (NLP) tasks. This paper …‏

שמור צטט צוטט על ידי 22 מאמרים בנושא זה כל 3 הגרסאות במטמון

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Deepinception: Hypnotize large language model to be jailbreaker‏

X Li, Z Zhou, J Zhu, J Yao, T Liu, B Han - arxiv preprint arxiv:2311.03191, 2023‏ - arxiv.org‏

Despite remarkable success in various applications, large language models (LLMs) are
vulnerable to adversarial jailbreaks that make the safety guardrails void. However, previous …‏

שמור צטט צוטט על ידי 154 מאמרים בנושא זה כל 5 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[HTML] mlr.press

[HTML][HTML] Position: TrustLLM: Trustworthiness in large language models‏

Y Huang, L Sun, H Wang, S Wu… - International …, 2024‏ - proceedings.mlr.press‏

Large language models (LLMs) have gained considerable attention for their excellent
natural language processing capabilities. Nonetheless, these LLMs present many …‏

שמור צטט צוטט על ידי 65 מאמרים בנושא זה כל 11 הגרסאות במטמון

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Factuality enhanced language models for open-ended text generation‏

N Lee, W **, P Xu, M Patwary… - Advances in …, 2022‏ - proceedings.neurips.cc‏

Pretrained language models (LMs) are susceptible to generate text with nonfactual
information. In this work, we measure and improve the factual accuracy of large-scale LMs …‏

שמור צטט צוטט על ידי 191 מאמרים בנושא זה כל 7 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A Wolf in Sheep's Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily‏

P Ding, J Kuang, D Ma, X Cao, Y **an, J Chen… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

Large Language Models (LLMs), such as ChatGPT and GPT-4, are designed to provide
useful and safe responses. However, adversarial prompts known as' jailbreaks' can …‏

שמור צטט צוטט על ידי 92 מאמרים בנושא זה כל 6 הגרסאות פתיחה בתור HTML

יצירת התראה

צטט

חיפוש מתקדם

נשמר בספרייה שלי

Exploring the limits of domain-adaptive training for detoxifying large-scale language models

Jailbroken: How does llm safety training fail?‏

[PDF][PDF] DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.‏

[PDF][PDF] Trustllm: Trustworthiness in large language models‏

Pretraining language models with human preferences‏

Gpt-4 is too smart to be safe: Stealthy chat with llms via cipher‏

[HTML][HTML] Contemporary approaches in evolving language models‏

Deepinception: Hypnotize large language model to be jailbreaker‏

[HTML][HTML] Position: TrustLLM: Trustworthiness in large language models‏

Factuality enhanced language models for open-ended text generation‏

A Wolf in Sheep's Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily‏