Unifying kv cache compression for large language models with leankv
Y Zhang, Y Hu, R Zhao, J Lui, H Chen - arxiv preprint arxiv:2412.03131, 2024 - arxiv.org
Large language models (LLMs) demonstrate exceptional performance but incur high serving
costs due to substantial memory demands, with the key-value (KV) cache being a primary …
costs due to substantial memory demands, with the key-value (KV) cache being a primary …
Seerattention: Learning intrinsic sparse attention in your llms
Attention is the cornerstone of modern Large Language Models (LLMs). Yet its quadratic
complexity limits the efficiency and scalability of LLMs, especially for those with a long …
complexity limits the efficiency and scalability of LLMs, especially for those with a long …
GRAPHMOE: Amplifying Cognitive Depth of Mixture-of-Experts Network via Introducing Self-Rethinking Mechanism
Traditional Mixture-of-Experts (MoE) networks benefit from utilizing multiple smaller expert
models as opposed to a single large network. However, these experts typically operate …
models as opposed to a single large network. However, these experts typically operate …
The Future of AI: Exploring the Potential of Large Concept Models
H Ahmad, D Goel - arxiv preprint arxiv:2501.05487, 2025 - arxiv.org
The field of Artificial Intelligence (AI) continues to drive transformative innovations, with
significant progress in conversational interfaces, autonomous vehicles, and intelligent …
significant progress in conversational interfaces, autonomous vehicles, and intelligent …
Attention heads of large language models
Z Zheng, Y Wang, Y Huang, S Song, M Yang, B Tang… - Patterns - cell.com
Large language models (LLMs) have demonstrated performance approaching human levels
in tasks such as long-text comprehension and mathematical reasoning, but they remain …
in tasks such as long-text comprehension and mathematical reasoning, but they remain …
Workshop on Sparsity in LLMs (SLLM): Deep Dive into Mixture of Experts, Quantization, Hardware, and Inference
Large Language Models (LLMs) have emerged as transformative tools in both research and
industry, excelling across a wide array of tasks. However, their growing computational …
industry, excelling across a wide array of tasks. However, their growing computational …
[PDF][PDF] Tutorial Proposal: Efficient Inference for Large Language Models–Algorithm, Model, and System
Background. Large Language Models (LLMs) have attracted significant attention from both
academia and industry in recent years. They are revolutionizing many applications …
academia and industry in recent years. They are revolutionizing many applications …