DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs

Z Liu, S Zhang, Y Liu, B Liu, Y Yang, Z Wang - arxiv preprint arxiv …, 2024 - arxiv.org
Direct preference learning offers a promising and computation-efficient beyond supervised
fine-tuning (SFT) for improving code generation in coding large language models (LMs) …