Efficient training of large language models on distributed infrastructures: a survey

J Duan, S Zhang, Z Wang, L Jiang, W Qu, Q Hu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) like GPT and LLaMA are revolutionizing the AI industry with
their sophisticated capabilities. Training these models requires vast GPU clusters and …

Rlhfuse: Efficient rlhf training for large language models with inter-and intra-stage fusion

Y Zhong, Z Zhang, B Wu, S Liu, Y Chen, C Wan… - arxiv preprint arxiv …, 2024 - arxiv.org
Reinforcement Learning from Human Feedback (RLHF) enhances the alignment between
LLMs and human preference. The workflow of RLHF typically involves several models and …

On designing effective rl reward at training time for llm reasoning

J Gao, S Xu, W Ye, W Liu, C He, W Fu, Z Mei… - arxiv preprint arxiv …, 2024 - arxiv.org
Reward models have been increasingly critical for improving the reasoning capability of
LLMs. Existing research has shown that a well-trained reward model can substantially …