- Academic Search

K Team, A Du, B Gao, B **ng, C Jiang, C Chen… - arxiv preprint arxiv …, 2025 - arxiv.org

Language model pretraining with next token prediction has proved effective for scaling
compute but is limited to the amount of available training data. Scaling reinforcement …

Salva Cita Citato da 8 Articoli correlati Tutte e 4 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Process reinforcement through implicit rewards

G Cui, L Yuan, Z Wang, H Wang, W Li, B He… - arxiv preprint arxiv …, 2025 - arxiv.org

Dense process rewards have proven a more effective alternative to the sparse outcome-
level rewards in the inference-time scaling of large language models (LLMs), particularly in …

Salva Cita Citato da 4 Articoli correlati Tutte e 2 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A Survey on Large Language Models with some Insights on their Capabilities and Limitations

A Matarazzo, R Torlone - arxiv preprint arxiv:2501.04040, 2025 - arxiv.org

The rapid advancement of artificial intelligence, particularly with the development of Large
Language Models (LLMs) built on the transformer architecture, has redefined the …

Salva Cita Citato da 1 Articoli correlati Tutte e 2 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding

Y Zuo, S Qu, Y Li, Z Chen, X Zhu, E Hua… - arxiv preprint arxiv …, 2025 - arxiv.org

We introduce MedXpertQA, a highly challenging and comprehensive benchmark to evaluate
expert-level medical knowledge and advanced reasoning. MedXpertQA includes 4,460 …

Salva Cita Citato da 2 Articoli correlati Tutte e 2 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Critique fine-tuning: Learning to critique is more effective than learning to imitate

Y Wang, X Yue, W Chen - arxiv preprint arxiv:2501.17703, 2025 - arxiv.org

Supervised Fine-Tuning (SFT) is commonly used to train language models to imitate
annotated responses for given instructions. In this paper, we challenge this paradigm and …

Salva Cita Citato da 2 Articoli correlati Tutte e 2 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Optimizing Temperature for Language Models with Multi-Sample Inference

W Du, Y Yang, S Welleck - arxiv preprint arxiv:2502.05234, 2025 - arxiv.org

Multi-sample aggregation strategies, such as majority voting and best-of-N sampling, are
widely used in contemporary large language models (LLMs) to enhance predictive accuracy …

Salva Cita Citato da 1 Articoli correlati Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis

Z Ye, X Zhu, CM Chan, X Wang, X Tan, J Lei… - arxiv preprint arxiv …, 2025 - arxiv.org

Recent advances in text-based large language models (LLMs), particularly in the GPT series
and the o1 model, have demonstrated the effectiveness of scaling both training-time and …

Salva Cita Citato da 1 Articoli correlati Tutte e 2 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

AI-driven materials design: a mini-review

M Cheng, CL Fu, R Okabe, A Chotrattanapituk… - arxiv preprint arxiv …, 2025 - arxiv.org

Materials design is an important component of modern science and technology, yet
traditional approaches rely heavily on trial-and-error and can be inefficient. Computational …

Salva Cita Articoli correlati Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities

F Jiang, Z Xu, Y Li, L Niu, Z **ang, B Li, BY Lin… - arxiv preprint arxiv …, 2025 - arxiv.org

Emerging large reasoning models (LRMs), such as DeepSeek-R1 models, leverage long
chain-of-thought (CoT) reasoning to generate structured intermediate steps, enhancing their …

Salva Cita Articoli correlati Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Improving Video Generation with Human Feedback

J Liu, G Liu, J Liang, Z Yuan, X Liu, M Zheng… - arxiv preprint arxiv …, 2025 - arxiv.org

Video generation has achieved significant advances through rectified flow techniques, but
issues like unsmooth motion and misalignment between videos and prompts persist. In this …

Salva Cita Articoli correlati Tutte e 2 le versioni Versione HTML

Crea avviso

Cita

Ricerca avanzata

Salvato in La mia biblioteca

Openai o1 system card

Kimi k1. 5: Scaling reinforcement learning with llms

Process reinforcement through implicit rewards

A Survey on Large Language Models with some Insights on their Capabilities and Limitations

MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding

Critique fine-tuning: Learning to critique is more effective than learning to imitate

Optimizing Temperature for Language Models with Multi-Sample Inference

Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis

AI-driven materials design: a mini-review

SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities

Improving Video Generation with Human Feedback