Google Академія

Статті

Академія

Результати: 3 (0,04 с)

Мій профіль Моя бібліотека

Sft memorizes, rl generalizes: A comparative study of foundation model post-training

Шукати серед статей із посиланнями

Turnitin 降AI改写早检测系统早降重系统 Turnitin-UK版万方检测-期刊版维普编辑部版 Grammarly检测 Paperpass检测 checkpass检测 PaperYY检测

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning

C Lyu, S Gao, Y Gu, W Zhang, J Gao, K Liu… - arxiv preprint arxiv …, 2025 - arxiv.org

Reasoning abilities, especially those for solving complex math problems, are crucial
components of general intelligence. Recent advances by proprietary companies, such as o …

Зберегти Послатися Цитовано в 1 джерелах Пов’язані статті Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Scaling Test-Time Compute Without Verification or RL is Suboptimal

A Setlur, N Rajaraman, S Levine, A Kumar - arxiv preprint arxiv …, 2025 - arxiv.org

Despite substantial advances in scaling test-time compute, an ongoing debate in the
community is how it should be scaled up to enable continued and efficient improvements …

Зберегти Послатися Пов’язані статті Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning

Y Yin, G Carenini - arxiv preprint arxiv:2502.04689, 2025 - arxiv.org

Large language models (LLMs) achieve remarkable performance on challenging
benchmarks that are often structured as multiple-choice question-answering (QA) tasks …

Зберегти Послатися Пов’язані статті Кількість версій: 2 Показати у форматі HTML

Створити сповіщення

Послатися

Розширений пошук

Збережено в моїй бібліотеці

Sft memorizes, rl generalizes: A comparative study of foundation model post-training

Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning

Scaling Test-Time Compute Without Verification or RL is Suboptimal

ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning