Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Self-play fine-tuning converts weak language models to strong language models
Harnessing the power of human-annotated data through Supervised Fine-Tuning (SFT) is
pivotal for advancing Large Language Models (LLMs). In this paper, we delve into the …
pivotal for advancing Large Language Models (LLMs). In this paper, we delve into the …
Many-shot in-context learning
Large language models (LLMs) excel at few-shot in-context learning (ICL)--learning from a
few examples provided in context at inference, without any weight updates. Newly expanded …
few examples provided in context at inference, without any weight updates. Newly expanded …
Iterative reasoning preference optimization
Iterative preference optimization methods have recently been shown to perform well for
general instruction tuning tasks, but typically make little improvement on reasoning tasks. In …
general instruction tuning tasks, but typically make little improvement on reasoning tasks. In …
Rest-mcts*: Llm self-training via process reward guided tree search
Recent methodologies in LLM self-training mostly rely on LLM generating responses and
filtering those with correct output answers as training data. This approach often yields a low …
filtering those with correct output answers as training data. This approach often yields a low …
A survey on knowledge distillation of large language models
In the era of Large Language Models (LLMs), Knowledge Distillation (KD) emerges as a
pivotal methodology for transferring advanced capabilities from leading proprietary LLMs …
pivotal methodology for transferring advanced capabilities from leading proprietary LLMs …
Lora learns less and forgets less
Low-Rank Adaptation (LoRA) is a widely-used parameter-efficient finetuning method for
large language models. LoRA saves memory by training only low rank perturbations to …
large language models. LoRA saves memory by training only low rank perturbations to …
Self-play preference optimization for language model alignment
Standard reinforcement learning from human feedback (RLHF) approaches relying on
parametric models like the Bradley-Terry model fall short in capturing the intransitivity and …
parametric models like the Bradley-Terry model fall short in capturing the intransitivity and …
Dart-math: Difficulty-aware rejection tuning for mathematical problem-solving
Solving mathematical problems requires advanced reasoning abilities and presents notable
challenges for large language models. Previous works usually synthesize data from …
challenges for large language models. Previous works usually synthesize data from …
[HTML][HTML] Self-training: A survey
Self-training methods have gained significant attention in recent years due to their
effectiveness in leveraging small labeled datasets and large unlabeled observations for …
effectiveness in leveraging small labeled datasets and large unlabeled observations for …
Training language models to self-correct via reinforcement learning
Self-correction is a highly desirable capability of large language models (LLMs), yet it has
consistently been found to be largely ineffective in modern LLMs. Current methods for …
consistently been found to be largely ineffective in modern LLMs. Current methods for …