Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Ai alignment: A comprehensive survey
AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, so do risks from misalignment. To provide a comprehensive …
AI systems grow more capable, so do risks from misalignment. To provide a comprehensive …
Discovering language model behaviors with model-written evaluations
As language models (LMs) scale, they develop many novel behaviors, good and bad,
exacerbating the need to evaluate how they behave. Prior work creates evaluations with …
exacerbating the need to evaluate how they behave. Prior work creates evaluations with …
Attack prompt generation for red teaming and defending large language models
Large language models (LLMs) are susceptible to red teaming attacks, which can induce
LLMs to generate harmful content. Previous research constructs attack prompts via manual …
LLMs to generate harmful content. Previous research constructs attack prompts via manual …
Gaining wisdom from setbacks: Aligning large language models via mistake analysis
The rapid development of large language models (LLMs) has not only provided numerous
opportunities but also presented significant challenges. This becomes particularly evident …
opportunities but also presented significant challenges. This becomes particularly evident …
Autodetect: Towards a unified framework for automated weakness detection in large language models
Although Large Language Models (LLMs) are becoming increasingly powerful, they still
exhibit significant but subtle weaknesses, such as mistakes in instruction-following or coding …
exhibit significant but subtle weaknesses, such as mistakes in instruction-following or coding …
InstructSafety: a unified framework for building multidimensional and explainable safety detector through instruction tuning
Safety detection has been an increasingly important topic in recent years and it has become
even more necessary to develop reliable safety detection systems with the rapid …
even more necessary to develop reliable safety detection systems with the rapid …
Towards safer generative language models: A survey on safety risks, evaluations, and improvements
As generative large model capabilities advance, safety concerns become more pronounced
in their outputs. To ensure the sustainable growth of the AI ecosystem, it's imperative to …
in their outputs. To ensure the sustainable growth of the AI ecosystem, it's imperative to …
An Auditing Test to Detect Behavioral Shift in Language Models
As language models (LMs) approach human-level performance, a comprehensive
understanding of their behavior becomes crucial. This includes evaluating capabilities …
understanding of their behavior becomes crucial. This includes evaluating capabilities …
CMD: a framework for Context-aware Model self-Detoxification
Text detoxification aims to minimize the risk of language models producing toxic content.
Existing detoxification methods of directly constraining the model output or further training …
Existing detoxification methods of directly constraining the model output or further training …
What's the most important value? INVP: INvestigating the Value Priorities of LLMs through Decision-making in Social Scenarios
As large language models (LLMs) demonstrate impressive performance in various tasks and
are increasingly integrated into the decision-making process, ensuring they align with …
are increasingly integrated into the decision-making process, ensuring they align with …