Kimi k1. 5: Scaling reinforcement learning with llms

K Team, A Du, B Gao, B **ng, C Jiang, C Chen… - arxiv preprint arxiv …, 2025 - arxiv.org
Language model pretraining with next token prediction has proved effective for scaling
compute but is limited to the amount of available training data. Scaling reinforcement …

Process reinforcement through implicit rewards

G Cui, L Yuan, Z Wang, H Wang, W Li, B He… - arxiv preprint arxiv …, 2025 - arxiv.org
Dense process rewards have proven a more effective alternative to the sparse outcome-
level rewards in the inference-time scaling of large language models (LLMs), particularly in …

A Survey on Large Language Models with some Insights on their Capabilities and Limitations

A Matarazzo, R Torlone - arxiv preprint arxiv:2501.04040, 2025 - arxiv.org
The rapid advancement of artificial intelligence, particularly with the development of Large
Language Models (LLMs) built on the transformer architecture, has redefined the …

MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding

Y Zuo, S Qu, Y Li, Z Chen, X Zhu, E Hua… - arxiv preprint arxiv …, 2025 - arxiv.org
We introduce MedXpertQA, a highly challenging and comprehensive benchmark to evaluate
expert-level medical knowledge and advanced reasoning. MedXpertQA includes 4,460 …

Critique fine-tuning: Learning to critique is more effective than learning to imitate

Y Wang, X Yue, W Chen - arxiv preprint arxiv:2501.17703, 2025 - arxiv.org
Supervised Fine-Tuning (SFT) is commonly used to train language models to imitate
annotated responses for given instructions. In this paper, we challenge this paradigm and …

Optimizing Temperature for Language Models with Multi-Sample Inference

W Du, Y Yang, S Welleck - arxiv preprint arxiv:2502.05234, 2025 - arxiv.org
Multi-sample aggregation strategies, such as majority voting and best-of-N sampling, are
widely used in contemporary large language models (LLMs) to enhance predictive accuracy …

Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis

Z Ye, X Zhu, CM Chan, X Wang, X Tan, J Lei… - arxiv preprint arxiv …, 2025 - arxiv.org
Recent advances in text-based large language models (LLMs), particularly in the GPT series
and the o1 model, have demonstrated the effectiveness of scaling both training-time and …

AI-driven materials design: a mini-review

M Cheng, CL Fu, R Okabe, A Chotrattanapituk… - arxiv preprint arxiv …, 2025 - arxiv.org
Materials design is an important component of modern science and technology, yet
traditional approaches rely heavily on trial-and-error and can be inefficient. Computational …

SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities

F Jiang, Z Xu, Y Li, L Niu, Z **ang, B Li, BY Lin… - arxiv preprint arxiv …, 2025 - arxiv.org
Emerging large reasoning models (LRMs), such as DeepSeek-R1 models, leverage long
chain-of-thought (CoT) reasoning to generate structured intermediate steps, enhancing their …

Improving Video Generation with Human Feedback

J Liu, G Liu, J Liang, Z Yuan, X Liu, M Zheng… - arxiv preprint arxiv …, 2025 - arxiv.org
Video generation has achieved significant advances through rectified flow techniques, but
issues like unsmooth motion and misalignment between videos and prompts persist. In this …