Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Shieldlm: Empowering llms as aligned, customizable and explainable safety detectors
The safety of Large Language Models (LLMs) has gained increasing attention in recent
years, but there still lacks a comprehensive approach for detecting safety issues within …
years, but there still lacks a comprehensive approach for detecting safety issues within …
Detoxifying large language models via knowledge editing
This paper investigates using knowledge editing techniques to detoxify Large Language
Models (LLMs). We construct a benchmark, SafeEdit, which covers nine unsafe categories …
Models (LLMs). We construct a benchmark, SafeEdit, which covers nine unsafe categories …
High-dimension human value representation in large language models
The widespread application of Large Language Models (LLMs) across various tasks and
fields has necessitated the alignment of these models with human values and preferences …
fields has necessitated the alignment of these models with human values and preferences …
Human-Readable Adversarial Prompts: An Investigation into LLM Vulnerabilities Using Situational Context
Previous research on LLM vulnerabilities often relied on nonsensical adversarial prompts,
which were easily detectable by automated methods. We address this gap by focusing on …
which were easily detectable by automated methods. We address this gap by focusing on …
Human-Interpretable Adversarial Prompt Attack on Large Language Models with Situational Context
Previous research on testing the vulnerabilities in Large Language Models (LLMs) using
adversarial attacks has primarily focused on nonsensical prompt injections, which are easily …
adversarial attacks has primarily focused on nonsensical prompt injections, which are easily …
Detoxifying Large Language Models via Kahneman-Tversky Optimization
Currently, the application of Large Language Models (LLMs) faces significant security
threats. Harmful questions and adversarial attack prompts can induce the LLMs to generate …
threats. Harmful questions and adversarial attack prompts can induce the LLMs to generate …