Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Jailbroken: How does llm safety training fail?
Large language models trained for safety and harmlessness remain susceptible to
adversarial misuse, as evidenced by the prevalence of “jailbreak” attacks on early releases …
adversarial misuse, as evidenced by the prevalence of “jailbreak” attacks on early releases …
[PDF][PDF] DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.
Abstract Generative Pre-trained Transformer (GPT) models have exhibited exciting progress
in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the …
in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the …
[PDF][PDF] Trustllm: Trustworthiness in large language models
Large language models (LLMs), exemplified by ChatGPT, have gained considerable
attention for their excellent natural language processing capabilities. Nonetheless, these …
attention for their excellent natural language processing capabilities. Nonetheless, these …
Pretraining language models with human preferences
Abstract Language models (LMs) are pretrained to imitate text from large and diverse
datasets that contain content that would violate human preferences if generated by an LM …
datasets that contain content that would violate human preferences if generated by an LM …
Gpt-4 is too smart to be safe: Stealthy chat with llms via cipher
Safety lies at the core of the development of Large Language Models (LLMs). There is
ample work on aligning LLMs with human ethics and preferences, including data filtering in …
ample work on aligning LLMs with human ethics and preferences, including data filtering in …
[HTML][HTML] Contemporary approaches in evolving language models
This article provides a comprehensive survey of contemporary language modeling
approaches within the realm of natural language processing (NLP) tasks. This paper …
approaches within the realm of natural language processing (NLP) tasks. This paper …
Deepinception: Hypnotize large language model to be jailbreaker
Despite remarkable success in various applications, large language models (LLMs) are
vulnerable to adversarial jailbreaks that make the safety guardrails void. However, previous …
vulnerable to adversarial jailbreaks that make the safety guardrails void. However, previous …
[HTML][HTML] Position: TrustLLM: Trustworthiness in large language models
Large language models (LLMs) have gained considerable attention for their excellent
natural language processing capabilities. Nonetheless, these LLMs present many …
natural language processing capabilities. Nonetheless, these LLMs present many …
Factuality enhanced language models for open-ended text generation
Pretrained language models (LMs) are susceptible to generate text with nonfactual
information. In this work, we measure and improve the factual accuracy of large-scale LMs …
information. In this work, we measure and improve the factual accuracy of large-scale LMs …
A Wolf in Sheep's Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily
Large Language Models (LLMs), such as ChatGPT and GPT-4, are designed to provide
useful and safe responses. However, adversarial prompts known as' jailbreaks' can …
useful and safe responses. However, adversarial prompts known as' jailbreaks' can …