The llama 3 herd of models
Modern artificial intelligence (AI) systems are powered by foundation models. This paper
presents a new set of foundation models, called Llama 3. It is a herd of language models …
presents a new set of foundation models, called Llama 3. It is a herd of language models …
Simulating 500 million years of evolution with a language model
More than three billion years of evolution have produced an image of biology encoded into
the space of natural proteins. Here we show that language models trained at scale on …
the space of natural proteins. Here we show that language models trained at scale on …
Rest-mcts*: Llm self-training via process reward guided tree search
Recent methodologies in LLM self-training mostly rely on LLM generating responses and
filtering those with correct output answers as training data. This approach often yields a low …
filtering those with correct output answers as training data. This approach often yields a low …
Generative verifiers: Reward modeling as next-token prediction
Verifiers or reward models are often used to enhance the reasoning performance of large
language models (LLMs). A common approach is the Best-of-N method, where N candidate …
language models (LLMs). A common approach is the Best-of-N method, where N candidate …
Smaller, weaker, yet better: Training llm reasoners via compute-optimal sampling
Training on high-quality synthetic data from strong language models (LMs) is a common
strategy to improve the reasoning performance of LMs. In this work, we revisit whether this …
strategy to improve the reasoning performance of LMs. In this work, we revisit whether this …
O1 Replication Journey: A Strategic Progress Report--Part 1
This paper introduces a pioneering approach to artificial intelligence research, embodied in
our O1 Replication Journey. In response to the announcement of OpenAI's groundbreaking …
our O1 Replication Journey. In response to the announcement of OpenAI's groundbreaking …
Building math agents with multi-turn iterative preference learning
Recent studies have shown that large language models'(LLMs) mathematical problem-
solving capabilities can be enhanced by integrating external tools, such as code …
solving capabilities can be enhanced by integrating external tools, such as code …
Enhancing the reasoning ability of multimodal large language models via mixed preference optimization
Existing open-source multimodal large language models (MLLMs) generally follow a
training process involving pre-training and supervised fine-tuning. However, these models …
training process involving pre-training and supervised fine-tuning. However, these models …
Weak-to-strong reasoning
When large language models (LLMs) exceed human-level capabilities, it becomes
increasingly challenging to provide full-scale and accurate supervision for these models …
increasingly challenging to provide full-scale and accurate supervision for these models …
Improve vision language model chain-of-thought reasoning
Chain-of-thought (CoT) reasoning in vision language models (VLMs) is crucial for improving
interpretability and trustworthiness. However, current training recipes lack robust CoT …
interpretability and trustworthiness. However, current training recipes lack robust CoT …