What's Wrong with Your Code Generated by Large Language Models? An Extensive Study

S Dou, H Jia, S Wu, H Zheng, W Zhou, M Wu… - arxiv preprint arxiv …, 2024 - arxiv.org
The increasing development of large language models (LLMs) in code generation has
drawn significant attention among researchers. To enhance LLM-based code generation …

Alchemistcoder: Harmonizing and eliciting code capability by hindsight tuning on multi-source data

Z Song, Y Wang, W Zhang, K Liu… - Advances in …, 2025 - proceedings.neurips.cc
Abstract Open-source Large Language Models (LLMs) and their specialized variants,
particularly Code LLMs, have recently delivered impressive performance. However …

Theoremllama: Transforming general-purpose llms into lean4 experts

R Wang, J Zhang, Y Jia, R Pan, S Diao, R Pi… - arxiv preprint arxiv …, 2024 - arxiv.org
Proving mathematical theorems using computer-verifiable formal languages like Lean
significantly impacts mathematical reasoning. One approach to formal theorem proving …

大语言模型合成数据方法简述 (A Brief Introduction to Synthetic Data for Large Language Model)

L Peiji, M Yichuan, Y Hang - Proceedings of the 23rd Chinese …, 2024 - aclanthology.org
Abstract “大语言模型在过去两年受到了极大的关注, 并引起了对通用人工智能的广泛讨论.
为了实现通用人工智能, 合成数据被认为是其中非常关键的一环. 本文将当前常见的数据合成 …