Qwen2. 5-math technical report: Toward mathematical expert model via self-improvement
In this report, we present a series of math-specific large language models: Qwen2. 5-Math
and Qwen2. 5-Math-Instruct-1.5 B/7B/72B. The core innovation of the Qwen2. 5 series lies in …
and Qwen2. 5-Math-Instruct-1.5 B/7B/72B. The core innovation of the Qwen2. 5 series lies in …
Step-dpo: Step-wise preference optimization for long-chain reasoning of llms
Mathematical reasoning presents a significant challenge for Large Language Models
(LLMs) due to the extensive and precise chain of reasoning required for accuracy. Ensuring …
(LLMs) due to the extensive and precise chain of reasoning required for accuracy. Ensuring …
[PDF][PDF] Numinamath: The largest public dataset in ai4maths with 860k pairs of competition math problems and solutions
Numina is an open AI4Maths initiative dedicated to advancing both artificial and human
intelligence in the field of mathematics. In this paper, we present the NuminaMath dataset, a …
intelligence in the field of mathematics. In this paper, we present the NuminaMath dataset, a …
Rethinking data selection at scale: Random selection is almost all you need
Supervised fine-tuning (SFT) is crucial for aligning Large Language Models (LLMs) with
human instructions. The primary goal during SFT is to select a small yet representative …
human instructions. The primary goal during SFT is to select a small yet representative …
Sc-Math: Spontaneous Step-level Self-correction Makes Large Language Models Better Mathematical Reasoners
Self-correction is a novel method that can stimulate the potential reasoning abilities of large
language models (LLMs). It involves detecting and correcting errors during the inference …
language models (LLMs). It involves detecting and correcting errors during the inference …
ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models
With the rise of multimodal applications, instruction data has become critical for training
multimodal language models capable of understanding complex image-based queries …
multimodal language models capable of understanding complex image-based queries …
Towards Effective and Efficient Continual Pre-training of Large Language Models
Continual pre-training (CPT) has been an important approach for adapting language models
to specific domains or tasks. To make the CPT approach more traceable, this paper presents …
to specific domains or tasks. To make the CPT approach more traceable, this paper presents …
Technical report: Enhancing llm reasoning with reward-guided tree search
Recently, test-time scaling has garnered significant attention from the research community,
largely due to the substantial advancements of the o1 model released by OpenAI. By …
largely due to the substantial advancements of the o1 model released by OpenAI. By …
Mix-cpt: A domain adaptation framework via decoupling knowledge learning and format alignment
Adapting general large language models (LLMs) to specialized domains presents great
challenges due to varied data distributions. This adaptation typically requires continual pre …
challenges due to varied data distributions. This adaptation typically requires continual pre …
Not Everything is All You Need: Toward Low-Redundant Optimization for Large Language Model Alignment
Large language models (LLMs) are still struggling in aligning with human preference in
complex tasks and scenarios. They are prone to overfit into the unexpected patterns or …
complex tasks and scenarios. They are prone to overfit into the unexpected patterns or …