Expanding performance boundaries of open-source multimodal models with model, data, and test-time scaling

Z Chen, W Wang, Y Cao, Y Liu, Z Gao, E Cui… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce InternVL 2.5, an advanced multimodal large language model (MLLM) series
that builds upon InternVL 2.0, maintaining its core model architecture while introducing …

Imitate, explore, and self-improve: A reproduction report on slow-thinking reasoning systems

Y Min, Z Chen, J Jiang, J Chen, J Deng, Y Hu… - arxiv preprint arxiv …, 2024 - arxiv.org
Recently, slow-thinking reasoning systems, such as o1, have demonstrated remarkable
capabilities in solving complex reasoning tasks. These systems typically engage in an …

Process reinforcement through implicit rewards

G Cui, L Yuan, Z Wang, H Wang, W Li, B He… - arxiv preprint arxiv …, 2025 - arxiv.org
Dense process rewards have proven a more effective alternative to the sparse outcome-
level rewards in the inference-time scaling of large language models (LLMs), particularly in …

Critique fine-tuning: Learning to critique is more effective than learning to imitate

Y Wang, X Yue, W Chen - arxiv preprint arxiv:2501.17703, 2025 - arxiv.org
Supervised Fine-Tuning (SFT) is commonly used to train language models to imitate
annotated responses for given instructions. In this paper, we challenge this paradigm and …

Formal mathematical reasoning: A new frontier in ai

K Yang, G Poesia, J He, W Li, K Lauter… - arxiv preprint arxiv …, 2024 - arxiv.org
AI for Mathematics (AI4Math) is not only intriguing intellectually but also crucial for AI-driven
discovery in science, engineering, and beyond. Extensive efforts on AI4Math have mirrored …

LIMO: Less is More for Reasoning

Y Ye, Z Huang, Y **ao, E Chern, S **a, P Liu - arxiv preprint arxiv …, 2025 - arxiv.org
We present a fundamental discovery that challenges our understanding of how complex
reasoning emerges in large language models. While conventional wisdom suggests that …

Technical report: Enhancing llm reasoning with reward-guided tree search

J Jiang, Z Chen, Y Min, J Chen, X Cheng… - arxiv preprint arxiv …, 2024 - arxiv.org
Recently, test-time scaling has garnered significant attention from the research community,
largely due to the substantial advancements of the o1 model released by OpenAI. By …

Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning

C Lyu, S Gao, Y Gu, W Zhang, J Gao, K Liu… - arxiv preprint arxiv …, 2025 - arxiv.org
Reasoning abilities, especially those for solving complex math problems, are crucial
components of general intelligence. Recent advances by proprietary companies, such as o …

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding

B Zhang, K Li, Z Cheng, Z Hu, Y Yuan, G Chen… - arxiv preprint arxiv …, 2025 - arxiv.org
In this paper, we propose VideoLLaMA3, a more advanced multimodal foundation model for
image and video understanding. The core design philosophy of VideoLLaMA3 is vision …

InfiR: Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning

C **e, S Cai, W Wang, P Li, Z Sang, K Yang… - arxiv preprint arxiv …, 2025 - arxiv.org
Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) have
made significant advancements in reasoning capabilities. However, they still face …