The llama 3 herd of models

A Dubey, A Jauhri, A Pandey, A Kadian… - arxiv preprint arxiv …, 2024 - arxiv.org
Modern artificial intelligence (AI) systems are powered by foundation models. This paper
presents a new set of foundation models, called Llama 3. It is a herd of language models …

Deepseekmath: Pushing the limits of mathematical reasoning in open language models

Z Shao, P Wang, Q Zhu, R Xu, J Song, X Bi… - arxiv preprint arxiv …, 2024 - arxiv.org
Mathematical reasoning poses a significant challenge for language models due to its
complex and structured nature. In this paper, we introduce DeepSeekMath 7B, which …

Toward self-improvement of llms via imagination, searching, and criticizing

Y Tian, B Peng, L Song, L **, D Yu… - Advances in Neural …, 2025 - proceedings.neurips.cc
Despite the impressive capabilities of Large Language Models (LLMs) on various tasks, they
still struggle with scenarios that involves complex reasoning and planning. Self-correction …

Recursive introspection: Teaching language model agents how to self-improve

Y Qu, T Zhang, N Garg… - Advances in Neural …, 2025 - proceedings.neurips.cc
A central piece in enabling intelligent agentic behavior in foundation models is to make them
capable of introspecting upon their behavior, reasoning, and correcting their mistakes as …

V-star: Training verifiers for self-taught reasoners

A Hosseini, X Yuan, N Malkin, A Courville… - arxiv preprint arxiv …, 2024 - arxiv.org
Common self-improvement approaches for large language models (LLMs), such as STaR,
iteratively fine-tune LLMs on self-generated solutions to improve their problem-solving …

Internlm-math: Open math large language models toward verifiable reasoning

H Ying, S Zhang, L Li, Z Zhou, Y Shao, Z Fei… - arxiv preprint arxiv …, 2024 - arxiv.org
The math abilities of large language models can represent their abstract reasoning ability. In
this paper, we introduce and open-source our math reasoning LLMs InternLM-Math which is …

Chain of preference optimization: Improving chain-of-thought reasoning in llms

X Zhang, C Du, T Pang, Q Liu… - Advances in Neural …, 2025 - proceedings.neurips.cc
The recent development of chain-of-thought (CoT) decoding has enabled large language
models (LLMs) to generate explicit logical reasoning paths for complex problem-solving …

Easy-to-hard generalization: Scalable alignment beyond human supervision

Z Sun, L Yu, Y Shen, W Liu, Y Yang, S Welleck… - arxiv preprint arxiv …, 2024 - arxiv.org
Current AI alignment methodologies rely on human-provided demonstrations or judgments,
and the learned capabilities of AI systems would be upper-bounded by human capabilities …

Step-dpo: Step-wise preference optimization for long-chain reasoning of llms

X Lai, Z Tian, Y Chen, S Yang, X Peng, J Jia - arxiv preprint arxiv …, 2024 - arxiv.org
Mathematical reasoning presents a significant challenge for Large Language Models
(LLMs) due to the extensive and precise chain of reasoning required for accuracy. Ensuring …

Next: Teaching large language models to reason about code execution

A Ni, M Allamanis, A Cohan, Y Deng, K Shi… - arxiv preprint arxiv …, 2024 - arxiv.org
A fundamental skill among human developers is the ability to understand and reason about
program execution. As an example, a programmer can mentally simulate code execution in …