Google Tudós

Y Yang, Z Cao, Q Chen, L Qin, D Yang, H Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org

The development of large language models (LLMs) has significantly expanded model sizes,
resulting in substantial GPU memory requirements during inference. The key and value …

Mentés Hivatkozás Idézetek száma: 3 Kapcsolódó cikkek Mind a(z) 3 változat HTML-változat

[Free GPT-4]

[PDF] arxiv.org

EfficientASR: Speech Recognition Network Compression via Attention Redundancy and Chunk-Level FFN Optimization

J Wang, Z Liang, X Zhang, N Cheng, J **ao - arxiv preprint arxiv …, 2024 - arxiv.org

In recent years, Transformer networks have shown remarkable performance in speech
recognition tasks. However, their deployment poses challenges due to high computational …

Mentés Hivatkozás Idézetek száma: 1 Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

[Free GPT-4]

[PDF] arxiv.org

UniForm: A Reuse Attention Mechanism Optimized for Efficient Vision Transformers on Edge Devices

SK Yeom, TH Kim - arxiv preprint arxiv:2412.02344, 2024 - arxiv.org

Transformer-based architectures have demonstrated remarkable success across various
domains, but their deployment on edge devices remains challenging due to high memory …

Mentés Hivatkozás Kapcsolódó cikkek HTML-változat

Értesítés létrehozása

Hivatkozás

Speciális keresés

Mentve a Saját könyvtárba

Exploring Attention Map Reuse for Efficient Transformer Neural Networks

Kvsharer: Efficient inference via layer-wise dissimilar kv cache sharing

EfficientASR: Speech Recognition Network Compression via Attention Redundancy and Chunk-Level FFN Optimization

UniForm: A Reuse Attention Mechanism Optimized for Efficient Vision Transformers on Edge Devices