- Academic Search

Z Chen, Y Deng, H Yuan, K Ji, Q Gu - arxiv preprint arxiv:2401.01335, 2024 - arxiv.org

Harnessing the power of human-annotated data through Supervised Fine-Tuning (SFT) is
pivotal for advancing Large Language Models (LLMs). In this paper, we delve into the …

บันทึก อ้างอิง อ้างโดย449 บทความที่เกี่ยวข้อง ทั้งหมด 16 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Many-shot in-context learning

R Agarwal, A Singh, L Zhang… - Advances in …, 2025 - proceedings.neurips.cc

Large language models (LLMs) excel at few-shot in-context learning (ICL)--learning from a
few examples provided in context at inference, without any weight updates. Newly expanded …

บันทึก อ้างอิง อ้างโดย95 บทความที่เกี่ยวข้อง ทั้งหมด 7 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Iterative reasoning preference optimization

RY Pang, W Yuan, H He, K Cho… - Advances in …, 2025 - proceedings.neurips.cc

Iterative preference optimization methods have recently been shown to perform well for
general instruction tuning tasks, but typically make little improvement on reasoning tasks. In …

บันทึก อ้างอิง อ้างโดย92 บทความที่เกี่ยวข้อง ทั้งหมด 5 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Rest-mcts*: Llm self-training via process reward guided tree search

D Zhang, S Zhoubian, Z Hu, Y Yue… - Advances in Neural …, 2025 - proceedings.neurips.cc

Recent methodologies in LLM self-training mostly rely on LLM generating responses and
filtering those with correct output answers as training data. This approach often yields a low …

บันทึก อ้างอิง อ้างโดย72 บทความที่เกี่ยวข้อง ทั้งหมด 6 ฉบับ ดูในรูปแบบ HTML

A survey on knowledge distillation of large language models

X Xu, M Li, C Tao, T Shen, R Cheng, J Li, C Xu… - arxiv preprint arxiv …, 2024 - arxiv.org

In the era of Large Language Models (LLMs), Knowledge Distillation (KD) emerges as a
pivotal methodology for transferring advanced capabilities from leading proprietary LLMs …

บันทึก อ้างอิง อ้างโดย134 บทความที่เกี่ยวข้อง ทั้งหมด 2 ฉบับ แคช

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Lora learns less and forgets less

D Biderman, J Portes, JJG Ortiz, M Paul… - … on Machine Learning …, 2024 - openreview.net

Low-Rank Adaptation (LoRA) is a widely-used parameter-efficient finetuning method for
large language models. LoRA saves memory by training only low rank perturbations to …

บันทึก อ้างอิง อ้างโดย100 บทความที่เกี่ยวข้อง ทั้งหมด 4 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Self-play preference optimization for language model alignment

Y Wu, Z Sun, H Yuan, K Ji, Y Yang, Q Gu - arxiv preprint arxiv:2405.00675, 2024 - arxiv.org

Standard reinforcement learning from human feedback (RLHF) approaches relying on
parametric models like the Bradley-Terry model fall short in capturing the intransitivity and …

บันทึก อ้างอิง อ้างโดย74 บทความที่เกี่ยวข้อง ทั้งหมด 5 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Dart-math: Difficulty-aware rejection tuning for mathematical problem-solving

Y Tong, X Zhang, R Wang, R Wu… - Advances in Neural …, 2025 - proceedings.neurips.cc

Solving mathematical problems requires advanced reasoning abilities and presents notable
challenges for large language models. Previous works usually synthesize data from …

บันทึก อ้างอิง อ้างโดย27 บทความที่เกี่ยวข้อง ทั้งหมด 7 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[HTML] sciencedirect.com

[HTML][HTML] Self-training: A survey

MR Amini, V Feofanov, L Pauletto, L Hadjadj… - Neurocomputing, 2025 - Elsevier

Self-training methods have gained significant attention in recent years due to their
effectiveness in leveraging small labeled datasets and large unlabeled observations for …

บันทึก อ้างอิง อ้างโดย146 บทความที่เกี่ยวข้อง ทั้งหมด 12 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Training language models to self-correct via reinforcement learning

A Kumar, V Zhuang, R Agarwal, Y Su… - arxiv preprint arxiv …, 2024 - arxiv.org

Self-correction is a highly desirable capability of large language models (LLMs), yet it has
consistently been found to be largely ineffective in modern LLMs. Current methods for …

บันทึก อ้างอิง อ้างโดย55 บทความที่เกี่ยวข้อง ทั้งหมด 2 ฉบับ ดูในรูปแบบ HTML

สร้างการแจ้งเตือน

อ้างอิง

การค้นหาขั้นสูง

บันทึกไปยังคลังของฉันแล้ว

Beyond human data: Scaling self-training for problem-solving with language models

Self-play fine-tuning converts weak language models to strong language models

Many-shot in-context learning

Iterative reasoning preference optimization

Rest-mcts*: Llm self-training via process reward guided tree search

A survey on knowledge distillation of large language models

Lora learns less and forgets less

Self-play preference optimization for language model alignment

Dart-math: Difficulty-aware rejection tuning for mathematical problem-solving

[HTML][HTML] Self-training: A survey

Training language models to self-correct via reinforcement learning