Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Mathematical discoveries from program search with large language models
Large language models (LLMs) have demonstrated tremendous capabilities in solving
complex tasks, from quantitative reasoning to understanding natural language. However …
complex tasks, from quantitative reasoning to understanding natural language. However …
Quiet-star: Language models can teach themselves to think before speaking
When writing and talking, people sometimes pause to think. Although reasoning-focused
works have often framed reasoning as a method of answering questions or completing …
works have often framed reasoning as a method of answering questions or completing …
Codereval: A benchmark of pragmatic code generation with generative pre-trained models
Code generation models based on the pre-training and fine-tuning paradigm have been
increasingly attempted by both academia and industry, resulting in well-known industrial …
increasingly attempted by both academia and industry, resulting in well-known industrial …
Buffer of thoughts: Thought-augmented reasoning with large language models
Abstract We introduce Buffer of Thoughts (BoT), a novel and versatile thought-augmented
reasoning approach for enhancing accuracy, efficiency and robustness of large language …
reasoning approach for enhancing accuracy, efficiency and robustness of large language …
Phenomenal yet puzzling: Testing inductive reasoning capabilities of language models with hypothesis refinement
The ability to derive underlying principles from a handful of observations and then
generalize to novel situations--known as inductive reasoning--is central to human …
generalize to novel situations--known as inductive reasoning--is central to human …
Cruxeval: A benchmark for code reasoning, understanding and execution
We present CRUXEval (Code Reasoning, Understanding, and eXecution Evaluation), a
benchmark consisting of 800 Python functions (3-13 lines). Each function comes with an …
benchmark consisting of 800 Python functions (3-13 lines). Each function comes with an …
If llm is the wizard, then code is the wand: A survey on how code empowers large language models to serve as intelligent agents
The prominent large language models (LLMs) of today differ from past language models not
only in size, but also in the fact that they are trained on a combination of natural language …
only in size, but also in the fact that they are trained on a combination of natural language …
Selfevolve: A code evolution framework via large language models
Large language models (LLMs) have already revolutionized code generation, after being
pretrained on publicly available code data. However, while various methods have been …
pretrained on publicly available code data. However, while various methods have been …
Language model crossover: Variation through few-shot prompting
This article pursues the insight that language models naturally enable an intelligent variation
operator similar in spirit to evolutionary crossover. In particular, language models of …
operator similar in spirit to evolutionary crossover. In particular, language models of …
Execution-based evaluation for open-domain code generation
To extend the scope of coding queries to more realistic settings, we propose ODEX, the first
Open-Domain EXecution-based natural language (NL) to Python code generation dataset …
Open-Domain EXecution-based natural language (NL) to Python code generation dataset …