Dart-math: Difficulty-aware rejection tuning for mathematical problem-solving

Y Tong, X Zhang, R Wang, R Wu… - Advances in Neural …, 2025 - proceedings.neurips.cc
Solving mathematical problems requires advanced reasoning abilities and presents notable
challenges for large language models. Previous works usually synthesize data from …

Mammoth2: Scaling instructions from the web

X Yue, T Zheng, G Zhang, W Chen - arxiv preprint arxiv:2405.03548, 2024 - arxiv.org
Instruction tuning improves the reasoning abilities of large language models (LLMs), with
data quality and scalability being the crucial factors. Most instruction tuning data come from …

Internal consistency and self-feedback in large language models: A survey

X Liang, S Song, Z Zheng, H Wang, Q Yu, X Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) often exhibit deficient reasoning or generate hallucinations.
To address these, studies prefixed with" Self-" such as Self-Consistency, Self-Improve, and …

A survey on data synthesis and augmentation for large language models

K Wang, J Zhu, M Ren, Z Liu, S Li, Z Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
The success of Large Language Models (LLMs) is inherently linked to the availability of vast,
diverse, and high-quality data for training and evaluation. However, the growth rate of high …

A theoretical understanding of self-correction through in-context alignment

Y Wang, Y Wu, Z Wei, S Jegelka, Y Wang - arxiv preprint arxiv …, 2024 - arxiv.org
Going beyond mimicking limited human experiences, recent studies show initial evidence
that, like humans, large language models (LLMs) are capable of improving their abilities …

Exploring automated energy optimization with unstructured building data: A multi-agent based framework leveraging large language models

T **ao, P Xu - Energy and Buildings, 2024 - Elsevier
The building sector is a significant energy consumer, making building energy optimization
crucial for reducing energy demand. Automating energy optimization tasks eases the …

Self-generated critiques boost reward modeling for language models

Y Yu, Z Chen, A Zhang, L Tan, C Zhu, RY Pang… - arxiv preprint arxiv …, 2024 - arxiv.org
Reward modeling is crucial for aligning large language models (LLMs) with human
preferences, especially in reinforcement learning from human feedback (RLHF). However …

Do not think that much for 2+ 3=? on the overthinking of o1-like llms

X Chen, J Xu, T Liang, Z He, J Pang, D Yu… - arxiv preprint arxiv …, 2024 - arxiv.org
The remarkable performance of models like the OpenAI o1 can be attributed to their ability to
emulate human-like long-time thinking during inference. These models employ extended …

Ptd-sql: Partitioning and targeted drilling with llms in text-to-sql

R Luo, L Wang, B Lin, Z Lin, Y Yang - arxiv preprint arxiv:2409.14082, 2024 - arxiv.org
Large Language Models (LLMs) have emerged as powerful tools for Text-to-SQL tasks,
exhibiting remarkable reasoning capabilities. Different from tasks such as math word …

Evaluating the Evaluator: Measuring LLMs' Adherence to Task Evaluation Instructions

B Murugadoss, C Poelitz, I Drosos, V Le… - arxiv preprint arxiv …, 2024 - arxiv.org
LLMs-as-a-judge is a recently popularized method which replaces human judgements in
task evaluation (Zheng et al. 2024) with automatic evaluation using LLMs. Due to …