Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Reinforcement Learning Enhanced LLMs: A Survey
This paper surveys research in the rapidly growing field of enhancing large language
models (LLMs) with reinforcement learning (RL), a technique that enables LLMs to improve …
models (LLMs) with reinforcement learning (RL), a technique that enables LLMs to improve …
Towards a unified view of preference learning for large language models: A survey
Large Language Models (LLMs) exhibit remarkably powerful capabilities. One of the crucial
factors to achieve success is aligning the LLM's output with human preferences. This …
factors to achieve success is aligning the LLM's output with human preferences. This …
Acemath: Advancing frontier math reasoning with post-training and reward modeling
In this paper, we introduce AceMath, a suite of frontier math models that excel in solving
complex math problems, along with highly effective reward models capable of evaluating …
complex math problems, along with highly effective reward models capable of evaluating …
Free process rewards without process labels
Different from its counterpart outcome reward models (ORMs), which evaluate the entire
responses, a process reward model (PRM) scores a reasoning trajectory step by step …
responses, a process reward model (PRM) scores a reasoning trajectory step by step …
JuStRank: Benchmarking LLM Judges for System Ranking
Given the rapid progress of generative AI, there is a pressing need to systematically
compare and choose between the numerous models and configurations available. The …
compare and choose between the numerous models and configurations available. The …
An Overview and Discussion on Using Large Language Models for Implementation Generation of Solutions to Open-Ended Problems
Large Language Models offer new opportunities to devise automated implementation
generation methods that can tackle problem solving activities beyond traditional methods …
generation methods that can tackle problem solving activities beyond traditional methods …
Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs
Sailor2 is a family of cutting-edge multilingual language models for South-East Asian (SEA)
languages, available in 1B, 8B, and 20B sizes to suit diverse applications. Building on …
languages, available in 1B, 8B, and 20B sizes to suit diverse applications. Building on …
InternLM-XComposer2. 5-Reward: A Simple Yet Effective Multi-Modal Reward Model
Despite the promising performance of Large Vision Language Models (LVLMs) in visual
understanding, they occasionally generate incorrect outputs. While reward models (RMs) …
understanding, they occasionally generate incorrect outputs. While reward models (RMs) …
Less is More: Improving LLM Alignment via Preference Data Selection
Direct Preference Optimization (DPO) has emerged as a promising approach for aligning
large language models with human preferences. While prior work mainly extends DPO from …
large language models with human preferences. While prior work mainly extends DPO from …
Policy-to-Language: Train LLMs to Explain Decisions with Flow-Matching Generated Rewards
As humans increasingly share environments with diverse agents powered by RL, LLMs, and
beyond, the ability to explain their policies in natural language will be vital for reliable …
beyond, the ability to explain their policies in natural language will be vital for reliable …