Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Simpo: Simple preference optimization with a reference-free reward
Abstract Direct Preference Optimization (DPO) is a widely used offline preference
optimization algorithm that reparameterizes reward functions in reinforcement learning from …
optimization algorithm that reparameterizes reward functions in reinforcement learning from …
Large language models are effective text rankers with pairwise ranking prompting
Ranking documents using Large Language Models (LLMs) by directly feeding the query and
candidate documents into the prompt is an interesting and practical problem. However …
candidate documents into the prompt is an interesting and practical problem. However …
Direct nash optimization: Teaching language models to self-improve with general preferences
This paper studies post-training large language models (LLMs) using preference feedback
from a powerful oracle to help a model iteratively improve over itself. The typical approach …
from a powerful oracle to help a model iteratively improve over itself. The typical approach …
Llm comparator: Visual analytics for side-by-side evaluation of large language models
Automatic side-by-side evaluation has emerged as a promising approach to evaluating the
quality of responses from large language models (LLMs). However, analyzing the results …
quality of responses from large language models (LLMs). However, analyzing the results …
Building math agents with multi-turn iterative preference learning
Recent studies have shown that large language models'(LLMs) mathematical problem-
solving capabilities can be enhanced by integrating external tools, such as code …
solving capabilities can be enhanced by integrating external tools, such as code …
A survey on human preference learning for large language models
The recent surge of versatile large language models (LLMs) largely depends on aligning
increasingly capable foundation models with human intentions by preference learning …
increasingly capable foundation models with human intentions by preference learning …
Towards a unified view of preference learning for large language models: A survey
Large Language Models (LLMs) exhibit remarkably powerful capabilities. One of the crucial
factors to achieve success is aligning the LLM's output with human preferences. This …
factors to achieve success is aligning the LLM's output with human preferences. This …
Prompt optimization with human feedback
Large language models (LLMs) have demonstrated remarkable performances in various
tasks. However, the performance of LLMs heavily depends on the input prompt, which has …
tasks. However, the performance of LLMs heavily depends on the input prompt, which has …
Alignment of diffusion models: Fundamentals, challenges, and future
Diffusion models have emerged as the leading paradigm in generative modeling, excelling
in various applications. Despite their success, these models often misalign with human …
in various applications. Despite their success, these models often misalign with human …
Filtered direct preference optimization
Reinforcement learning from human feedback (RLHF) plays a crucial role in aligning
language models with human preferences. While the significance of dataset quality is …
language models with human preferences. While the significance of dataset quality is …