- Academic Search

A Dubey, A Jauhri, A Pandey, A Kadian… - arxiv preprint arxiv …, 2024 - arxiv.org

Modern artificial intelligence (AI) systems are powered by foundation models. This paper
presents a new set of foundation models, called Llama 3. It is a herd of language models …

Enregistrer Citer Cité 2246 fois Autres articles Les 3 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] biorxiv.org

Simulating 500 million years of evolution with a language model

T Hayes, R Rao, H Akin, NJ Sofroniew, D Oktay, Z Lin… - Science, 2025 - science.org

More than three billion years of evolution have produced an image of biology encoded into
the space of natural proteins. Here we show that language models trained at scale on …

Enregistrer Citer Cité 127 fois Autres articles Les 4 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Rest-mcts*: Llm self-training via process reward guided tree search

D Zhang, S Zhoubian, Z Hu, Y Yue, Y Dong… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent methodologies in LLM self-training mostly rely on LLM generating responses and
filtering those with correct output answers as training data. This approach often yields a low …

Enregistrer Citer Cité 53 fois Autres articles Les 2 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Generative verifiers: Reward modeling as next-token prediction

L Zhang, A Hosseini, H Bansal, M Kazemi… - arxiv preprint arxiv …, 2024 - arxiv.org

Verifiers or reward models are often used to enhance the reasoning performance of large
language models (LLMs). A common approach is the Best-of-N method, where N candidate …

Enregistrer Citer Cité 39 fois Autres articles Les 4 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Smaller, weaker, yet better: Training llm reasoners via compute-optimal sampling

H Bansal, A Hosseini, R Agarwal, VQ Tran… - arxiv preprint arxiv …, 2024 - arxiv.org

Training on high-quality synthetic data from strong language models (LMs) is a common
strategy to improve the reasoning performance of LMs. In this work, we revisit whether this …

Enregistrer Citer Cité 21 fois Autres articles Les 2 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

O1 Replication Journey: A Strategic Progress Report--Part 1

Y Qin, X Li, H Zou, Y Liu, S **a, Z Huang, Y Ye… - arxiv preprint arxiv …, 2024 - arxiv.org

This paper introduces a pioneering approach to artificial intelligence research, embodied in
our O1 Replication Journey. In response to the announcement of OpenAI's groundbreaking …

Enregistrer Citer Cité 16 fois Autres articles Les 3 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Building math agents with multi-turn iterative preference learning

W **ong, C Shi, J Shen, A Rosenberg, Z Qin… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent studies have shown that large language models'(LLMs) mathematical problem-
solving capabilities can be enhanced by integrating external tools, such as code …

Enregistrer Citer Cité 11 fois Autres articles Les 2 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Enhancing the reasoning ability of multimodal large language models via mixed preference optimization

W Wang, Z Chen, W Wang, Y Cao, Y Liu, Z Gao… - arxiv preprint arxiv …, 2024 - arxiv.org

Existing open-source multimodal large language models (MLLMs) generally follow a
training process involving pre-training and supervised fine-tuning. However, these models …

Enregistrer Citer Cité 6 fois Autres articles Version HTML

[Free GPT-4]

[PDF] arxiv.org

Weak-to-strong reasoning

Y Yang, Y Ma, P Liu - arxiv preprint arxiv:2407.13647, 2024 - arxiv.org

When large language models (LLMs) exceed human-level capabilities, it becomes
increasingly challenging to provide full-scale and accurate supervision for these models …

Enregistrer Citer Cité 8 fois Autres articles Les 4 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Improve vision language model chain-of-thought reasoning

R Zhang, B Zhang, Y Li, H Zhang, Z Sun, Z Gan… - arxiv preprint arxiv …, 2024 - arxiv.org

Chain-of-thought (CoT) reasoning in vision language models (VLMs) is crucial for improving
interpretability and trustworthiness. However, current training recipes lack robust CoT …

Enregistrer Citer Cité 6 fois Autres articles Les 3 versions Free GPT-4 Version HTML

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

The llama 3 herd of models

Simulating 500 million years of evolution with a language model

Rest-mcts*: Llm self-training via process reward guided tree search

Generative verifiers: Reward modeling as next-token prediction

Smaller, weaker, yet better: Training llm reasoners via compute-optimal sampling

O1 Replication Journey: A Strategic Progress Report--Part 1

Building math agents with multi-turn iterative preference learning

Enhancing the reasoning ability of multimodal large language models via mixed preference optimization

Weak-to-strong reasoning

Improve vision language model chain-of-thought reasoning