Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Livebench: A challenging, contamination-free llm benchmark
Test set contamination, wherein test data from a benchmark ends up in a newer model's
training set, is a well-documented obstacle for fair LLM evaluation and can quickly render …
training set, is a well-documented obstacle for fair LLM evaluation and can quickly render …
Frontiermath: A benchmark for evaluating advanced mathematical reasoning in ai
We introduce FrontierMath, a benchmark of hundreds of original, exceptionally challenging
mathematics problems crafted and vetted by expert mathematicians. The questions cover …
mathematics problems crafted and vetted by expert mathematicians. The questions cover …
Processbench: Identifying process errors in mathematical reasoning
As language models regularly make mistakes when solving math problems, automated
identification of errors in the reasoning process becomes increasingly significant for their …
identification of errors in the reasoning process becomes increasingly significant for their …
Are Your LLMs Capable of Stable Reasoning?
The rapid advancement of Large Language Models (LLMs) has demonstrated remarkable
progress in complex reasoning tasks. However, a significant discrepancy persists between …
progress in complex reasoning tasks. However, a significant discrepancy persists between …
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings
With the increasing code reasoning capabilities of existing large language models (LLMs)
and breakthroughs in reasoning models like OpenAI o1 and o3, there is a growing need to …
and breakthroughs in reasoning models like OpenAI o1 and o3, there is a growing need to …
A Survey on Feedback-based Multi-step Reasoning for Large Language Models on Mathematics
Recent progress in large language models (LLM) found chain-of-thought prompting
strategies to improve the reasoning ability of LLMs by encouraging problem solving through …
strategies to improve the reasoning ability of LLMs by encouraging problem solving through …
Examining False Positives under Inference Scaling for Mathematical Reasoning
Recent advancements in language models have led to significant improvements in
mathematical reasoning across various benchmarks. However, most of these benchmarks …
mathematical reasoning across various benchmarks. However, most of these benchmarks …
On the logical skills of large language models: evaluations using arbitrarily complex first-order logic problems
We present a method of generating first-order logic statements whose complexity can be
controlled along multiple dimensions. We use this method to automatically create several …
controlled along multiple dimensions. We use this method to automatically create several …
Understand, Solve and Translate: Bridging the Multilingual Mathematical Reasoning Gap
Large language models (LLMs) demonstrate exceptional performance on complex
reasoning tasks. However, despite their strong reasoning capabilities in high-resource …
reasoning tasks. However, despite their strong reasoning capabilities in high-resource …
FastMCTS: A Simple Sampling Strategy for Data Synthesis
Synthetic high-quality multi-step reasoning data can significantly enhance the performance
of large language models on various tasks. However, most existing methods rely on …
of large language models on various tasks. However, most existing methods rely on …