Mastering diverse domains through world models

D Hafner, J Pasukonis, J Ba, T Lillicrap - ar** a general algorithm that learns to solve tasks across a wide range of
applications has been a fundamental challenge in artificial intelligence. Although current …

Deepseekmath: Pushing the limits of mathematical reasoning in open language models

Z Shao, P Wang, Q Zhu, R Xu, J Song, X Bi… - arxiv preprint arxiv …, 2024 - arxiv.org
Mathematical reasoning poses a significant challenge for language models due to its
complex and structured nature. In this paper, we introduce DeepSeekMath 7B, which …

Physdiff: Physics-guided human motion diffusion model

Y Yuan, J Song, U Iqbal, A Vahdat… - Proceedings of the …, 2023 - openaccess.thecvf.com
Denoising diffusion models hold great promise for generating diverse and realistic human
motions. However, existing motion diffusion models largely disregard the laws of physics in …

Rrhf: Rank responses to align language models with human feedback without tears

Z Yuan, H Yuan, C Tan, W Wang, S Huang… - arxiv preprint arxiv …, 2023 - arxiv.org
Reinforcement Learning from Human Feedback (RLHF) facilitates the alignment of large
language models with human preferences, significantly enhancing the quality of interactions …

Reinforcement learning for fine-tuning text-to-image diffusion models

Y Fan, O Watkins, Y Du, H Liu, M Ryu… - Advances in …, 2024 - proceedings.neurips.cc
Learning from human feedback has been shown to improve text-to-image models. These
techniques first learn a reward function that captures what humans care about in the task …

Training language models to follow instructions with human feedback

L Ouyang, J Wu, X Jiang, D Almeida… - Advances in neural …, 2022 - proceedings.neurips.cc
Making language models bigger does not inherently make them better at following a user's
intent. For example, large language models can generate outputs that are untruthful, toxic, or …

Video pretraining (vpt): Learning to act by watching unlabeled online videos

B Baker, I Akkaya, P Zhokov… - Advances in …, 2022 - proceedings.neurips.cc
Pretraining on noisy, internet-scale datasets has been heavily studied as a technique for
training models with broad, general capabilities for text, images, and other modalities …