Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Ai alignment: A comprehensive survey
AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, so do risks from misalignment. To provide a comprehensive …
AI systems grow more capable, so do risks from misalignment. To provide a comprehensive …
Benchmark data contamination of large language models: A survey
The rapid development of Large Language Models (LLMs) like GPT-4, Claude-3, and
Gemini has transformed the field of natural language processing. However, it has also …
Gemini has transformed the field of natural language processing. However, it has also …
Benchmarking benchmark leakage in large language models
Amid the expanding use of pre-training data, the phenomenon of benchmark dataset
leakage has become increasingly prominent, exacerbated by opaque training processes …
leakage has become increasingly prominent, exacerbated by opaque training processes …
Promptbench: A unified library for evaluation of large language models
The evaluation of large language models (LLMs) is crucial to assess their performance and
mitigate potential security risks. In this paper, we introduce PromptBench, a unified library to …
mitigate potential security risks. In this paper, we introduce PromptBench, a unified library to …
Kieval: A knowledge-grounded interactive evaluation framework for large language models
Automatic evaluation methods for large language models (LLMs) are hindered by data
contamination, leading to inflated assessments of their effectiveness. Existing strategies …
contamination, leading to inflated assessments of their effectiveness. Existing strategies …
Darg: Dynamic evaluation of large language models via adaptive reasoning graph
The current paradigm of evaluating Large Language Models (LLMs) through static
benchmarks comes with significant limitations, such as vulnerability to data contamination …
benchmarks comes with significant limitations, such as vulnerability to data contamination …
The landscape of emerging ai agent architectures for reasoning, planning, and tool calling: A survey
This survey paper examines the recent advancements in AI agent implementations, with a
focus on their ability to achieve complex goals that require enhanced reasoning, planning …
focus on their ability to achieve complex goals that require enhanced reasoning, planning …
Nphardeval: Dynamic benchmark on reasoning ability of large language models via complexity classes
Complex reasoning ability is one of the most important features of current LLMs, which has
also been leveraged to play an integral role in complex decision-making tasks. Therefore …
also been leveraged to play an integral role in complex decision-making tasks. Therefore …
Graphinstruct: Empowering large language models with graph understanding and reasoning capability
Evaluating and enhancing the general capabilities of large language models (LLMs) has
been an important research topic. Graph is a common data structure in the real world, and …
been an important research topic. Graph is a common data structure in the real world, and …
Co-occurrence is not factual association in language models
Pretrained language models can encode a large amount of knowledge and utilize it for
various reasoning tasks, yet they can still struggle to learn novel factual knowledge …
various reasoning tasks, yet they can still struggle to learn novel factual knowledge …