Process reinforcement through implicit rewards

G Cui, L Yuan, Z Wang, H Wang, W Li, B He… - arxiv preprint arxiv …, 2025 - arxiv.org
Dense process rewards have proven a more effective alternative to the sparse outcome-
level rewards in the inference-time scaling of large language models (LLMs), particularly in …

Critique fine-tuning: Learning to critique is more effective than learning to imitate

Y Wang, X Yue, W Chen - arxiv preprint arxiv:2501.17703, 2025 - arxiv.org
Supervised Fine-Tuning (SFT) is commonly used to train language models to imitate
annotated responses for given instructions. In this paper, we challenge this paradigm and …

Examining False Positives under Inference Scaling for Mathematical Reasoning

Y Wang, N Yang, L Wang, F Wei - arxiv preprint arxiv:2502.06217, 2025 - arxiv.org
Recent advancements in language models have led to significant improvements in
mathematical reasoning across various benchmarks. However, most of these benchmarks …

Digital Twin Buildings: 3D Modeling, GIS Integration, and Visual Descriptions Using Gaussian Splatting, ChatGPT/Deepseek, and Google Maps Platforms

K Gao, D Lu, L Li, N Chen, H He, L Xu, J Li - arxiv preprint arxiv …, 2025 - arxiv.org
Urban digital twins are virtual replicas of cities that use multi-source data and data analytics
to optimize urban planning, infrastructure management, and decision-making. Towards this …