Efficient large language models: A survey

Z Wan, X Wang, C Liu, S Alam, Y Zheng, J Liu… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs) have demonstrated remarkable capabilities in important
tasks such as natural language understanding and language generation, and thus have the …

Distillspec: Improving speculative decoding via knowledge distillation

Y Zhou, K Lyu, AS Rawat, AK Menon… - arxiv preprint arxiv …, 2023 - arxiv.org
Speculative decoding (SD) accelerates large language model inference by employing a
faster draft model for generating multiple tokens, which are then verified in parallel by the …

Survey on knowledge distillation for large language models: methods, evaluation, and application

C Yang, Y Zhu, W Lu, Y Wang, Q Chen, C Gao… - ACM Transactions on …, 2024 - dl.acm.org
Large Language Models (LLMs) have showcased exceptional capabilities in various
domains, attracting significant interest from both academia and industry. Despite their …

Model compression and efficient inference for large language models: A survey

W Wang, W Chen, Y Luo, Y Long, Z Lin… - arxiv preprint arxiv …, 2024 - arxiv.org
Transformer based large language models have achieved tremendous success. However,
the significant memory and computational costs incurred during the inference process make …

PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning

G Kim, D Jang, E Yang - arxiv preprint arxiv:2402.12842, 2024 - arxiv.org
Recent advancements in large language models (LLMs) have raised concerns about
inference costs, increasing the need for research into model compression. While knowledge …

SWITCH: Studying with Teacher for Knowledge Distillation of Large Language Models

J Koo, Y Hwang, Y Kim, T Kang, H Bae… - arxiv preprint arxiv …, 2024 - arxiv.org
Despite the success of Large Language Models (LLMs), they still face challenges related to
high inference costs and memory requirements. To address these issues, Knowledge …

Using Advanced LLMs to Enhance Smaller LLMs: An Interpretable Knowledge Distillation Approach

T Wang, K Sudhir, D Hong - arxiv preprint arxiv:2408.07238, 2024 - arxiv.org
Advanced Large language models (LLMs) like GPT-4 or LlaMa 3 provide superior
performance in complex human-like interactions. But they are costly, or too large for edge …

Red Teaming for Multimodal Large Language Models: A Survey

M Mahato, A Kumar, K Singh, B Kukreja, J Nabi - Authorea Preprints, 2024 - techrxiv.org
As Generative AI becomes more prevalent, the vulnerability to security threats grows. This
study conducts a thorough exploration of red teaming methods within the domain of …

Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs

Y Park, J Hyun, SL Cho, B Sim, JW Lee - arxiv preprint arxiv:2402.10517, 2024 - arxiv.org
Recently, considerable efforts have been directed towards compressing Large Language
Models (LLMs), which showcase groundbreaking capabilities across diverse applications …

[PDF][PDF] Beyond Answers: Transferring Reasoning Capabilities to Smaller LLMs Using Multi-Teacher Knowledge Distillation

Y Tian, Y Han, X Chen, W Wang, NV Chawla - 2024 - yikunhan.me
Transferring the reasoning capability from stronger large language models (LLMs) to smaller
ones has been quite appealing, as smaller LLMs are more flexible to deploy with less …