Iterative reasoning preference optimization

RY Pang, W Yuan, H He, K Cho… - Advances in …, 2025 - proceedings.neurips.cc
Iterative preference optimization methods have recently been shown to perform well for
general instruction tuning tasks, but typically make little improvement on reasoning tasks. In …

Datasets for large language models: A comprehensive survey

Y Liu, J Cao, C Liu, K Ding, L ** - arxiv preprint arxiv:2402.18041, 2024 - arxiv.org
This paper embarks on an exploration into the Large Language Model (LLM) datasets,
which play a crucial role in the remarkable advancements of LLMs. The datasets serve as …

Recursive introspection: Teaching language model agents how to self-improve

Y Qu, T Zhang, N Garg… - Advances in Neural …, 2025 - proceedings.neurips.cc
A central piece in enabling intelligent agentic behavior in foundation models is to make them
capable of introspecting upon their behavior, reasoning, and correcting their mistakes as …

Step-dpo: Step-wise preference optimization for long-chain reasoning of llms

X Lai, Z Tian, Y Chen, S Yang, X Peng, J Jia - arxiv preprint arxiv …, 2024 - arxiv.org
Mathematical reasoning presents a significant challenge for Large Language Models
(LLMs) due to the extensive and precise chain of reasoning required for accuracy. Ensuring …

Llama-berry: Pairwise optimization for o1-like olympiad-level mathematical reasoning

D Zhang, J Wu, J Lei, T Che, J Li, T **e… - arxiv preprint arxiv …, 2024 - arxiv.org
This paper presents an advanced mathematical problem-solving framework, LLaMA-Berry,
for enhancing the mathematical reasoning ability of Large Language Models (LLMs). The …

Dart-math: Difficulty-aware rejection tuning for mathematical problem-solving

Y Tong, X Zhang, R Wang, R Wu, J He - arxiv preprint arxiv:2407.13690, 2024 - arxiv.org
Solving mathematical problems requires advanced reasoning abilities and presents notable
challenges for large language models. Previous works usually synthesize data from …

Building math agents with multi-turn iterative preference learning

W **ong, C Shi, J Shen, A Rosenberg, Z Qin… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent studies have shown that large language models'(LLMs) mathematical problem-
solving capabilities can be enhanced by integrating external tools, such as code …

Openmathinstruct-2: Accelerating ai for math with massive open-source instruction data

S Toshniwal, W Du, I Moshkov, B Kisacanin… - arxiv preprint arxiv …, 2024 - arxiv.org
Mathematical reasoning continues to be a critical challenge in large language model (LLM)
development with significant interest. However, most of the cutting-edge progress in …

Ai-assisted generation of difficult math questions

V Shah, D Yu, K Lyu, S Park, J Yu, Y He, NR Ke… - arxiv preprint arxiv …, 2024 - arxiv.org
Current LLM training positions mathematical reasoning as a core capability. With publicly
available sources fully tapped, there is unmet demand for diverse and challenging math …

Bluelm-v-3b: Algorithm and system co-design for multimodal large language models on mobile devices

X Lu, Y Chen, C Chen, H Tan, B Chen, Y **e… - arxiv preprint arxiv …, 2024 - arxiv.org
The emergence and growing popularity of multimodal large language models (MLLMs) have
significant potential to enhance various aspects of daily life, from improving communication …