Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Kimi k1. 5: Scaling reinforcement learning with llms
K Team, A Du, B Gao, B **ng, C Jiang, C Chen… - arxiv preprint arxiv …, 2025 - arxiv.org
Language model pretraining with next token prediction has proved effective for scaling
compute but is limited to the amount of available training data. Scaling reinforcement …
compute but is limited to the amount of available training data. Scaling reinforcement …
Process reinforcement through implicit rewards
Dense process rewards have proven a more effective alternative to the sparse outcome-
level rewards in the inference-time scaling of large language models (LLMs), particularly in …
level rewards in the inference-time scaling of large language models (LLMs), particularly in …
A Survey on Large Language Models with some Insights on their Capabilities and Limitations
A Matarazzo, R Torlone - arxiv preprint arxiv:2501.04040, 2025 - arxiv.org
The rapid advancement of artificial intelligence, particularly with the development of Large
Language Models (LLMs) built on the transformer architecture, has redefined the …
Language Models (LLMs) built on the transformer architecture, has redefined the …
MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding
We introduce MedXpertQA, a highly challenging and comprehensive benchmark to evaluate
expert-level medical knowledge and advanced reasoning. MedXpertQA includes 4,460 …
expert-level medical knowledge and advanced reasoning. MedXpertQA includes 4,460 …
Critique fine-tuning: Learning to critique is more effective than learning to imitate
Supervised Fine-Tuning (SFT) is commonly used to train language models to imitate
annotated responses for given instructions. In this paper, we challenge this paradigm and …
annotated responses for given instructions. In this paper, we challenge this paradigm and …
Optimizing Temperature for Language Models with Multi-Sample Inference
Multi-sample aggregation strategies, such as majority voting and best-of-N sampling, are
widely used in contemporary large language models (LLMs) to enhance predictive accuracy …
widely used in contemporary large language models (LLMs) to enhance predictive accuracy …
Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis
Recent advances in text-based large language models (LLMs), particularly in the GPT series
and the o1 model, have demonstrated the effectiveness of scaling both training-time and …
and the o1 model, have demonstrated the effectiveness of scaling both training-time and …
AI-driven materials design: a mini-review
Materials design is an important component of modern science and technology, yet
traditional approaches rely heavily on trial-and-error and can be inefficient. Computational …
traditional approaches rely heavily on trial-and-error and can be inefficient. Computational …
SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities
F Jiang, Z Xu, Y Li, L Niu, Z **ang, B Li, BY Lin… - arxiv preprint arxiv …, 2025 - arxiv.org
Emerging large reasoning models (LRMs), such as DeepSeek-R1 models, leverage long
chain-of-thought (CoT) reasoning to generate structured intermediate steps, enhancing their …
chain-of-thought (CoT) reasoning to generate structured intermediate steps, enhancing their …
Improving Video Generation with Human Feedback
Video generation has achieved significant advances through rectified flow techniques, but
issues like unsmooth motion and misalignment between videos and prompts persist. In this …
issues like unsmooth motion and misalignment between videos and prompts persist. In this …