The llama 3 herd of models

A Dubey, A Jauhri, A Pandey, A Kadian… - arxiv preprint arxiv …, 2024 - arxiv.org
Modern artificial intelligence (AI) systems are powered by foundation models. This paper
presents a new set of foundation models, called Llama 3. It is a herd of language models …

Simulating 500 million years of evolution with a language model

T Hayes, R Rao, H Akin, NJ Sofroniew, D Oktay, Z Lin… - Science, 2025 - science.org
More than three billion years of evolution have produced an image of biology encoded into
the space of natural proteins. Here we show that language models trained at scale on …

Rest-mcts*: Llm self-training via process reward guided tree search

D Zhang, S Zhoubian, Z Hu, Y Yue, Y Dong… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent methodologies in LLM self-training mostly rely on LLM generating responses and
filtering those with correct output answers as training data. This approach often yields a low …

Generative verifiers: Reward modeling as next-token prediction

L Zhang, A Hosseini, H Bansal, M Kazemi… - arxiv preprint arxiv …, 2024 - arxiv.org
Verifiers or reward models are often used to enhance the reasoning performance of large
language models (LLMs). A common approach is the Best-of-N method, where N candidate …

Smaller, weaker, yet better: Training llm reasoners via compute-optimal sampling

H Bansal, A Hosseini, R Agarwal, VQ Tran… - arxiv preprint arxiv …, 2024 - arxiv.org
Training on high-quality synthetic data from strong language models (LMs) is a common
strategy to improve the reasoning performance of LMs. In this work, we revisit whether this …

O1 Replication Journey: A Strategic Progress Report--Part 1

Y Qin, X Li, H Zou, Y Liu, S **a, Z Huang, Y Ye… - arxiv preprint arxiv …, 2024 - arxiv.org
This paper introduces a pioneering approach to artificial intelligence research, embodied in
our O1 Replication Journey. In response to the announcement of OpenAI's groundbreaking …

Building math agents with multi-turn iterative preference learning

W **ong, C Shi, J Shen, A Rosenberg, Z Qin… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent studies have shown that large language models'(LLMs) mathematical problem-
solving capabilities can be enhanced by integrating external tools, such as code …

Enhancing the reasoning ability of multimodal large language models via mixed preference optimization

W Wang, Z Chen, W Wang, Y Cao, Y Liu, Z Gao… - arxiv preprint arxiv …, 2024 - arxiv.org
Existing open-source multimodal large language models (MLLMs) generally follow a
training process involving pre-training and supervised fine-tuning. However, these models …

Weak-to-strong reasoning

Y Yang, Y Ma, P Liu - arxiv preprint arxiv:2407.13647, 2024 - arxiv.org
When large language models (LLMs) exceed human-level capabilities, it becomes
increasingly challenging to provide full-scale and accurate supervision for these models …

Improve vision language model chain-of-thought reasoning

R Zhang, B Zhang, Y Li, H Zhang, Z Sun, Z Gan… - arxiv preprint arxiv …, 2024 - arxiv.org
Chain-of-thought (CoT) reasoning in vision language models (VLMs) is crucial for improving
interpretability and trustworthiness. However, current training recipes lack robust CoT …