Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Security and privacy challenges of large language models: A survey
Large language models (LLMs) have demonstrated extraordinary capabilities and
contributed to multiple fields, such as generating and summarizing text, language …
contributed to multiple fields, such as generating and summarizing text, language …
[HTML][HTML] A survey on large language model (llm) security and privacy: The good, the bad, and the ugly
Abstract Large Language Models (LLMs), such as ChatGPT and Bard, have revolutionized
natural language understanding and generation. They possess deep language …
natural language understanding and generation. They possess deep language …
Jailbroken: How does llm safety training fail?
Large language models trained for safety and harmlessness remain susceptible to
adversarial misuse, as evidenced by the prevalence of “jailbreak” attacks on early releases …
adversarial misuse, as evidenced by the prevalence of “jailbreak” attacks on early releases …
" do anything now": Characterizing and evaluating in-the-wild jailbreak prompts on large language models
The misuse of large language models (LLMs) has drawn significant attention from the
general public and LLM vendors. One particular type of adversarial prompt, known as …
general public and LLM vendors. One particular type of adversarial prompt, known as …
Challenges and applications of large language models
Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …
Large language models cannot self-correct reasoning yet
Large Language Models (LLMs) have emerged as a groundbreaking technology with their
unparalleled text generation capabilities across various applications. Nevertheless …
unparalleled text generation capabilities across various applications. Nevertheless …
Open problems and fundamental limitations of reinforcement learning from human feedback
Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state …
to align with human goals. RLHF has emerged as the central method used to finetune state …
[PDF][PDF] DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.
Abstract Generative Pre-trained Transformer (GPT) models have exhibited exciting progress
in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the …
in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the …
Defending chatgpt against jailbreak attack via self-reminders
ChatGPT is a societally impactful artificial intelligence tool with millions of users and
integration into products such as Bing. However, the emergence of jailbreak attacks notably …
integration into products such as Bing. However, the emergence of jailbreak attacks notably …
Tree of attacks: Jailbreaking black-box llms automatically
Abstract While Large Language Models (LLMs) display versatile functionality, they continue
to generate harmful, biased, and toxic content, as demonstrated by the prevalence of human …
to generate harmful, biased, and toxic content, as demonstrated by the prevalence of human …