Qwen2. 5-math technical report: Toward mathematical expert model via self-improvement

A Yang, B Zhang, B Hui, B Gao, B Yu, C Li… - arxiv preprint arxiv …, 2024 - arxiv.org
In this report, we present a series of math-specific large language models: Qwen2. 5-Math
and Qwen2. 5-Math-Instruct-1.5 B/7B/72B. The core innovation of the Qwen2. 5 series lies in …

Step-dpo: Step-wise preference optimization for long-chain reasoning of llms

X Lai, Z Tian, Y Chen, S Yang, X Peng, J Jia - arxiv preprint arxiv …, 2024 - arxiv.org
Mathematical reasoning presents a significant challenge for Large Language Models
(LLMs) due to the extensive and precise chain of reasoning required for accuracy. Ensuring …

[PDF][PDF] Numinamath: The largest public dataset in ai4maths with 860k pairs of competition math problems and solutions

J Li, E Beeching, L Tunstall, B Lipkin… - Hugging Face …, 2024 - faculty.bicmr.pku.edu.cn
Numina is an open AI4Maths initiative dedicated to advancing both artificial and human
intelligence in the field of mathematics. In this paper, we present the NuminaMath dataset, a …

Rethinking data selection at scale: Random selection is almost all you need

T **a, B Yu, K Dang, A Yang, Y Wu, Y Tian… - arxiv preprint arxiv …, 2024 - arxiv.org
Supervised fine-tuning (SFT) is crucial for aligning Large Language Models (LLMs) with
human instructions. The primary goal during SFT is to select a small yet representative …

Sc-Math: Spontaneous Step-level Self-correction Makes Large Language Models Better Mathematical Reasoners

Y Yan, J Jiang, Y Liu, Y Cao, X Xu, X Cai… - arxiv preprint arxiv …, 2024 - arxiv.org
Self-correction is a novel method that can stimulate the potential reasoning abilities of large
language models (LLMs). It involves detecting and correcting errors during the inference …

ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models

J Zhang, L Xue, L Song, J Wang, W Huang… - arxiv preprint arxiv …, 2024 - arxiv.org
With the rise of multimodal applications, instruction data has become critical for training
multimodal language models capable of understanding complex image-based queries …

Towards Effective and Efficient Continual Pre-training of Large Language Models

J Chen, Z Chen, J Wang, K Zhou, Y Zhu, J Jiang… - arxiv preprint arxiv …, 2024 - arxiv.org
Continual pre-training (CPT) has been an important approach for adapting language models
to specific domains or tasks. To make the CPT approach more traceable, this paper presents …

Technical report: Enhancing llm reasoning with reward-guided tree search

J Jiang, Z Chen, Y Min, J Chen, X Cheng… - arxiv preprint arxiv …, 2024 - arxiv.org
Recently, test-time scaling has garnered significant attention from the research community,
largely due to the substantial advancements of the o1 model released by OpenAI. By …

Mix-cpt: A domain adaptation framework via decoupling knowledge learning and format alignment

J Jiang, J Li, WX Zhao, Y Song, T Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Adapting general large language models (LLMs) to specialized domains presents great
challenges due to varied data distributions. This adaptation typically requires continual pre …

Not Everything is All You Need: Toward Low-Redundant Optimization for Large Language Model Alignment

Z Chen, K Zhou, WX Zhao, J Wang… - Proceedings of the 2024 …, 2024 - aclanthology.org
Large language models (LLMs) are still struggling in aligning with human preference in
complex tasks and scenarios. They are prone to overfit into the unexpected patterns or …