- Academic Search

Z Zheng, Y Wang, Y Huang, S Song, M Yang… - ar** the number of activated …

Salva Cita Citato da 2 Articoli correlati Versione HTML

Unifying kv cache compression for large language models with leankv

Y Zhang, Y Hu, R Zhao, J Lui, H Chen - arxiv preprint arxiv:2412.03131, 2024 - arxiv.org

Large language models (LLMs) demonstrate exceptional performance but incur high serving
costs due to substantial memory demands, with the key-value (KV) cache being a primary …

Salva Cita Citato da 1 Articoli correlati Versione HTML

Seerattention: Learning intrinsic sparse attention in your llms

Y Gao, Z Zeng, D Du, S Cao, HKH So, T Cao… - arxiv preprint arxiv …, 2024 - arxiv.org

Attention is the cornerstone of modern Large Language Models (LLMs). Yet its quadratic
complexity limits the efficiency and scalability of LLMs, especially for those with a long …

Salva Cita Citato da 2 Articoli correlati Copia cache

[Free GPT-4]

[PDF] arxiv.org

GRAPHMOE: Amplifying Cognitive Depth of Mixture-of-Experts Network via Introducing Self-Rethinking Mechanism

C Tang, B Lv, Z Zheng, B Yang, K Zhao, N Liao… - arxiv preprint arxiv …, 2025 - arxiv.org

Traditional Mixture-of-Experts (MoE) networks benefit from utilizing multiple smaller expert
models as opposed to a single large network. However, these experts typically operate …

Salva Cita Articoli correlati Versione HTML

[Free GPT-4]

[PDF] arxiv.org

The Future of AI: Exploring the Potential of Large Concept Models

H Ahmad, D Goel - arxiv preprint arxiv:2501.05487, 2025 - arxiv.org

The field of Artificial Intelligence (AI) continues to drive transformative innovations, with
significant progress in conversational interfaces, autonomous vehicles, and intelligent …

Salva Cita Articoli correlati Versione HTML

[Free GPT-4]

[PDF] cell.com

Attention heads of large language models

Z Zheng, Y Wang, Y Huang, S Song, M Yang, B Tang… - Patterns - cell.com

Large language models (LLMs) have demonstrated performance approaching human levels
in tasks such as long-text comprehension and mathematical reasoning, but they remain …

Salva Cita Articoli correlati

[Free GPT-4]

[PDF] openreview.net

Workshop on Sparsity in LLMs (SLLM): Deep Dive into Mixture of Experts, Quantization, Hardware, and Inference

T Chen, U Evci, Y Ioannou, B Isik, S Liu… - ICLR 2025 Workshop … - openreview.net

Large Language Models (LLMs) have emerged as transformative tools in both research and
industry, excelling across a wide array of tasks. However, their growing computational …

Salva Cita Articoli correlati Versione HTML

[Free GPT-4]

[PDF] nics-effalg.com

[PDF][PDF] Tutorial Proposal: Efficient Inference for Large Language Models–Algorithm, Model, and System

X Ning, G Dai, H Bai, L Hou, Y Wang, Q Liu - nics-effalg.com

Background. Large Language Models (LLMs) have attracted significant attention from both
academia and industry in recent years. They are revolutionizing many applications …

Salva Cita Articoli correlati Versione HTML

Crea avviso

Cita

Ricerca avanzata

Salvato in La mia biblioteca

Moa: Mixture of sparse attention for automatic large language model compression

Attention heads of large language models: A survey

Unifying kv cache compression for large language models with leankv

Seerattention: Learning intrinsic sparse attention in your llms

GRAPHMOE: Amplifying Cognitive Depth of Mixture-of-Experts Network via Introducing Self-Rethinking Mechanism

The Future of AI: Exploring the Potential of Large Concept Models

Attention heads of large language models

Workshop on Sparsity in LLMs (SLLM): Deep Dive into Mixture of Experts, Quantization, Hardware, and Inference

[PDF][PDF] Tutorial Proposal: Efficient Inference for Large Language Models–Algorithm, Model, and System