- Academic Search

Z Chen, Y Deng, H Yuan, K Ji, Q Gu - arxiv preprint arxiv:2401.01335, 2024 - arxiv.org

Harnessing the power of human-annotated data through Supervised Fine-Tuning (SFT) is
pivotal for advancing Large Language Models (LLMs). In this paper, we delve into the …

Salva Cita Citato da 188 Articoli correlati Tutte e 3 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

Lora learns less and forgets less

D Biderman, J Portes, JJG Ortiz, M Paul… - arxiv preprint arxiv …, 2024 - arxiv.org

Low-Rank Adaptation (LoRA) is a widely-used parameter-efficient finetuning method for
large language models. LoRA saves memory by training only low rank perturbations to …

Salva Cita Citato da 85 Articoli correlati Tutte e 2 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

Many-shot in-context learning

R Agarwal, A Singh, LM Zhang, B Bohnet… - arxiv preprint arxiv …, 2024 - arxiv.org

Large language models (LLMs) excel at few-shot in-context learning (ICL)--learning from a
few examples provided in context at inference, without any weight updates. Newly expanded …

Salva Cita Citato da 58 Articoli correlati Tutte e 3 le versioni Versione HTML

[Free GPT-4]

[HTML] sciencedirect.com

[HTML][HTML] Self-training: A survey

MR Amini, V Feofanov, L Pauletto, L Hadjadj… - Neurocomputing, 2025 - Elsevier

Self-training methods have gained significant attention in recent years due to their
effectiveness in leveraging small labeled datasets and large unlabeled observations for …

Salva Cita Citato da 144 Articoli correlati Tutte e 3 le versioni

[Free GPT-4]

[PDF] arxiv.org

Rest-mcts*: Llm self-training via process reward guided tree search

D Zhang, S Zhoubian, Z Hu, Y Yue, Y Dong… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent methodologies in LLM self-training mostly rely on LLM generating responses and
filtering those with correct output answers as training data. This approach often yields a low …

Salva Cita Citato da 53 Articoli correlati Tutte e 2 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

Training language models to self-correct via reinforcement learning

A Kumar, V Zhuang, R Agarwal, Y Su… - arxiv preprint arxiv …, 2024 - arxiv.org

Self-correction is a highly desirable capability of large language models (LLMs), yet it has
consistently been found to be largely ineffective in modern LLMs. Current methods for …

Salva Cita Citato da 41 Articoli correlati Tutte e 2 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

Generative verifiers: Reward modeling as next-token prediction

L Zhang, A Hosseini, H Bansal, M Kazemi… - arxiv preprint arxiv …, 2024 - arxiv.org

Verifiers or reward models are often used to enhance the reasoning performance of large
language models (LLMs). A common approach is the Best-of-N method, where N candidate …

Salva Cita Citato da 39 Articoli correlati Tutte e 4 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

Llm2llm: Boosting llms with novel iterative data enhancement

N Lee, T Wattanawong, S Kim, K Mangalam… - arxiv preprint arxiv …, 2024 - arxiv.org

Pretrained large language models (LLMs) are currently state-of-the-art for solving the vast
majority of natural language processing tasks. While many real-world applications still …

Salva Cita Citato da 38 Articoli correlati Tutte e 2 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

Smaller, weaker, yet better: Training llm reasoners via compute-optimal sampling

H Bansal, A Hosseini, R Agarwal, VQ Tran… - arxiv preprint arxiv …, 2024 - arxiv.org

Training on high-quality synthetic data from strong language models (LMs) is a common
strategy to improve the reasoning performance of LMs. In this work, we revisit whether this …

Salva Cita Citato da 21 Articoli correlati Tutte e 2 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

A survey on knowledge distillation of large language models

X Xu, M Li, C Tao, T Shen, R Cheng, J Li, C Xu… - arxiv preprint arxiv …, 2024 - arxiv.org

This survey presents an in-depth exploration of knowledge distillation (KD) techniques
within the realm of Large Language Models (LLMs), spotlighting the pivotal role of KD in …

Salva Cita Citato da 116 Articoli correlati Tutte e 2 le versioni Versione HTML

Crea avviso

Cita

Ricerca avanzata

Salvato in La mia biblioteca

Beyond human data: Scaling self-training for problem-solving with language models

Self-play fine-tuning converts weak language models to strong language models

Lora learns less and forgets less

Many-shot in-context learning

[HTML][HTML] Self-training: A survey

Rest-mcts*: Llm self-training via process reward guided tree search

Training language models to self-correct via reinforcement learning

Generative verifiers: Reward modeling as next-token prediction

Llm2llm: Boosting llms with novel iterative data enhancement

Smaller, weaker, yet better: Training llm reasoners via compute-optimal sampling

A survey on knowledge distillation of large language models