Efficient large language models: A survey
Large Language Models (LLMs) have demonstrated remarkable capabilities in important
tasks such as natural language understanding and language generation, and thus have the …
tasks such as natural language understanding and language generation, and thus have the …
Distillspec: Improving speculative decoding via knowledge distillation
Speculative decoding (SD) accelerates large language model inference by employing a
faster draft model for generating multiple tokens, which are then verified in parallel by the …
faster draft model for generating multiple tokens, which are then verified in parallel by the …
Survey on knowledge distillation for large language models: methods, evaluation, and application
Large Language Models (LLMs) have showcased exceptional capabilities in various
domains, attracting significant interest from both academia and industry. Despite their …
domains, attracting significant interest from both academia and industry. Despite their …
Model compression and efficient inference for large language models: A survey
Transformer based large language models have achieved tremendous success. However,
the significant memory and computational costs incurred during the inference process make …
the significant memory and computational costs incurred during the inference process make …
PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning
Recent advancements in large language models (LLMs) have raised concerns about
inference costs, increasing the need for research into model compression. While knowledge …
inference costs, increasing the need for research into model compression. While knowledge …
SWITCH: Studying with Teacher for Knowledge Distillation of Large Language Models
J Koo, Y Hwang, Y Kim, T Kang, H Bae… - arxiv preprint arxiv …, 2024 - arxiv.org
Despite the success of Large Language Models (LLMs), they still face challenges related to
high inference costs and memory requirements. To address these issues, Knowledge …
high inference costs and memory requirements. To address these issues, Knowledge …
Using Advanced LLMs to Enhance Smaller LLMs: An Interpretable Knowledge Distillation Approach
Advanced Large language models (LLMs) like GPT-4 or LlaMa 3 provide superior
performance in complex human-like interactions. But they are costly, or too large for edge …
performance in complex human-like interactions. But they are costly, or too large for edge …
Red Teaming for Multimodal Large Language Models: A Survey
As Generative AI becomes more prevalent, the vulnerability to security threats grows. This
study conducts a thorough exploration of red teaming methods within the domain of …
study conducts a thorough exploration of red teaming methods within the domain of …
Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
Recently, considerable efforts have been directed towards compressing Large Language
Models (LLMs), which showcase groundbreaking capabilities across diverse applications …
Models (LLMs), which showcase groundbreaking capabilities across diverse applications …
[PDF][PDF] Beyond Answers: Transferring Reasoning Capabilities to Smaller LLMs Using Multi-Teacher Knowledge Distillation
Transferring the reasoning capability from stronger large language models (LLMs) to smaller
ones has been quite appealing, as smaller LLMs are more flexible to deploy with less …
ones has been quite appealing, as smaller LLMs are more flexible to deploy with less …