Self-play fine-tuning converts weak language models to strong language models
Harnessing the power of human-annotated data through Supervised Fine-Tuning (SFT) is
pivotal for advancing Large Language Models (LLMs). In this paper, we delve into the …
pivotal for advancing Large Language Models (LLMs). In this paper, we delve into the …
Lora learns less and forgets less
Low-Rank Adaptation (LoRA) is a widely-used parameter-efficient finetuning method for
large language models. LoRA saves memory by training only low rank perturbations to …
large language models. LoRA saves memory by training only low rank perturbations to …
Many-shot in-context learning
Large language models (LLMs) excel at few-shot in-context learning (ICL)--learning from a
few examples provided in context at inference, without any weight updates. Newly expanded …
few examples provided in context at inference, without any weight updates. Newly expanded …
[HTML][HTML] Self-training: A survey
Self-training methods have gained significant attention in recent years due to their
effectiveness in leveraging small labeled datasets and large unlabeled observations for …
effectiveness in leveraging small labeled datasets and large unlabeled observations for …
Rest-mcts*: Llm self-training via process reward guided tree search
Recent methodologies in LLM self-training mostly rely on LLM generating responses and
filtering those with correct output answers as training data. This approach often yields a low …
filtering those with correct output answers as training data. This approach often yields a low …
Training language models to self-correct via reinforcement learning
Self-correction is a highly desirable capability of large language models (LLMs), yet it has
consistently been found to be largely ineffective in modern LLMs. Current methods for …
consistently been found to be largely ineffective in modern LLMs. Current methods for …
Generative verifiers: Reward modeling as next-token prediction
Verifiers or reward models are often used to enhance the reasoning performance of large
language models (LLMs). A common approach is the Best-of-N method, where N candidate …
language models (LLMs). A common approach is the Best-of-N method, where N candidate …
Llm2llm: Boosting llms with novel iterative data enhancement
Pretrained large language models (LLMs) are currently state-of-the-art for solving the vast
majority of natural language processing tasks. While many real-world applications still …
majority of natural language processing tasks. While many real-world applications still …
Smaller, weaker, yet better: Training llm reasoners via compute-optimal sampling
Training on high-quality synthetic data from strong language models (LMs) is a common
strategy to improve the reasoning performance of LMs. In this work, we revisit whether this …
strategy to improve the reasoning performance of LMs. In this work, we revisit whether this …
A survey on knowledge distillation of large language models
This survey presents an in-depth exploration of knowledge distillation (KD) techniques
within the realm of Large Language Models (LLMs), spotlighting the pivotal role of KD in …
within the realm of Large Language Models (LLMs), spotlighting the pivotal role of KD in …