Direct preference knowledge distillation for large language models

Y Li, Y Gu, L Dong, D Wang, Y Cheng, F Wei - arxiv preprint arxiv …, 2024 - arxiv.org
In the field of large language models (LLMs), Knowledge Distillation (KD) is a critical
technique for transferring capabilities from teacher models to student models. However …

LLaVA-KD: A Framework of Distilling Multimodal Large Language Models

Y Cai, J Zhang, H He, X He, A Tong, Z Gan… - arxiv preprint arxiv …, 2024 - arxiv.org
The success of Large Language Models (LLM) has led researchers to explore Multimodal
Large Language Models (MLLM) for unified visual and linguistic understanding. However …

Exploring and enhancing the transfer of distribution in knowledge distillation for autoregressive language models

J Rao, X Liu, Z Lin, L Ding, J Li, D Tao… - arxiv preprint arxiv …, 2024 - arxiv.org
Knowledge distillation (KD) is a technique that compresses large teacher models by training
smaller student models to mimic them. The success of KD in auto-regressive language …

Dual-Space Knowledge Distillation for Large Language Models

S Zhang, X Zhang, Z Sun, Y Chen, J Xu - arxiv preprint arxiv:2406.17328, 2024 - arxiv.org
Knowledge distillation (KD) is known as a promising solution to compress large language
models (LLMs) via transferring their knowledge to smaller models. During this process, white …

Llm-neo: Parameter efficient knowledge distillation for large language models

R Yang, T Wu, J Wang, P Hu, N Wong… - arxiv preprint arxiv …, 2024 - arxiv.org
In this paper, we propose a novel LLM-Neo framework that efficiently transfers knowledge
from a large language model (LLM) teacher to a compact student. Initially, we revisit the …

Agent-DA: Enhancing low-resource event extraction with collaborative multi-agent data augmentation

X Tian, Y Guo, B Ge, X Yuan, H Zhang, Y Yang… - Knowledge-Based …, 2024 - Elsevier
Low-resource event extraction presents a significant challenge in real-world applications,
particularly in domains like pharmaceuticals, military and law, where data is frequently …

Self-Evolution Knowledge Distillation for LLM-based Machine Translation

Y Song, L Ding, C Zan, S Huang - arxiv preprint arxiv:2412.15303, 2024 - arxiv.org
Knowledge distillation (KD) has shown great promise in transferring knowledge from larger
teacher models to smaller student models. However, existing KD strategies for large …

RKLD: Reverse KL-Divergence-based Knowledge Distillation for Unlearning Personal Information in Large Language Models

B Wang, Y Zi, Y Sun, Y Zhao, B Qin - arxiv preprint arxiv:2406.01983, 2024 - arxiv.org
With the passage of the Right to Be Forgotten (RTBF) regulations and the scaling up of
language model training datasets, research on model unlearning in large language models …

TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models

M Shing, K Misaki, H Bao, S Yokoi, T Akiba - arxiv preprint arxiv …, 2025 - arxiv.org
Causal language models have demonstrated remarkable capabilities, but their size poses
significant challenges for deployment in resource-constrained environments. Knowledge …

[HTML][HTML] Knowledge Extraction from LLMs for Scalable Historical Data Annotation

F Celli, D Mingazov - Electronics, 2024 - mdpi.com
This paper introduces a novel approach to extract knowledge from large language models
and generate structured historical datasets. We investigate the feasibility and limitations of …