- Academic Search

MGS Murshed, C Murphy, D Hou, N Khan… - ACM Computing …, 2021 - dl.acm.org

Resource-constrained IoT devices, such as sensors and actuators, have become ubiquitous
in recent years. This has led to the generation of large quantities of data in real-time, which …

Simpan Kutip Dirujuk 489 kali Artikel terkait 6 versi

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey of techniques for optimizing transformer inference

KT Chitty-Venkata, S Mittal, M Emani… - Journal of Systems …, 2023 - Elsevier

Recent years have seen a phenomenal rise in the performance and applications of
transformer neural networks. The family of transformer networks, including Bidirectional …

Simpan Kutip Dirujuk 68 kali Artikel terkait 6 versi

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Spatten: Efficient sparse attention architecture with cascade token and head pruning

H Wang, Z Zhang, S Han - 2021 IEEE International Symposium …, 2021 - ieeexplore.ieee.org

The attention mechanism is becoming increasingly popular in Natural Language Processing
(NLP) applications, showing superior performance than convolutional and recurrent …

Simpan Kutip Dirujuk 396 kali Artikel terkait 6 versi

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Hawq-v3: Dyadic neural network quantization

Z Yao, Z Dong, Z Zheng, A Gholami… - International …, 2021 - proceedings.mlr.press

Current low-precision quantization algorithms often have the hidden cost of conversion back
and forth from floating point to quantized integer values. This hidden cost limits the latency …

Simpan Kutip Dirujuk 282 kali Artikel terkait 8 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Olive: Accelerating large language models via hardware-friendly outlier-victim pair quantization

C Guo, J Tang, W Hu, J Leng, C Zhang… - Proceedings of the 50th …, 2023 - dl.acm.org

Transformer-based large language models (LLMs) have achieved great success with the
growing model size. LLMs' size grows by 240× every two years, which outpaces the …

Simpan Kutip Dirujuk 89 kali Artikel terkait 8 versi

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Fact: Ffn-attention co-optimized transformer architecture with eager correlation prediction

Y Qin, Y Wang, D Deng, Z Zhao, X Yang, L Liu… - Proceedings of the 50th …, 2023 - dl.acm.org

Transformer model is becoming prevalent in various AI applications with its outstanding
performance. However, the high cost of computation and memory footprint make its …

Simpan Kutip Dirujuk 47 kali Artikel terkait

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Ant: Exploiting adaptive numerical data type for low-bit deep neural network quantization

C Guo, C Zhang, J Leng, Z Liu, F Yang… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org

Quantization is a technique to reduce the computation and memory cost of DNN models,
which are getting increasingly large. Existing quantization solutions use fixed-point integer …

Simpan Kutip Dirujuk 52 kali Artikel terkait 10 versi

[Free GPT-4]
[DeepSeek]

[PDF] github.io

[PDF][PDF] Optimizing Selective Protection for CNN Resilience.

A Mahmoud, SKS Hari, CW Fletcher, SV Adve, C Sakr… - ISSRE, 2021 - ma3mool.github.io

As CNNs are being extensively employed in high performance and safety-critical
applications that demand high reliability, it is important to ensure that they are resilient to …

Simpan Kutip Dirujuk 48 kali Artikel terkait 7 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Unleashing the Potential of Spiking Neural Networks with Dynamic Confidence

C Li, EG Jones, S Furber - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

This paper presents a new methodology to alleviate the fundamental trade-off between
accuracy and latency in spiking neural networks (SNNs). The approach involves decoding …

Simpan Kutip Dirujuk 20 kali Artikel terkait 6 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Energon: Toward efficient acceleration of transformers using dynamic sparse attention

Z Zhou, J Liu, Z Gu, G Sun - IEEE Transactions on Computer …, 2022 - ieeexplore.ieee.org

In recent years, transformer models have revolutionized natural language processing (NLP)
and shown promising performance on computer vision (CV) tasks. Despite their …

Simpan Kutip Dirujuk 43 kali Artikel terkait 3 versi

Buat notifikasi

Kutip

Penelusuran lanjutan

Disimpan ke Koleksi saya

Drq: dynamic region-based quantization for deep neural network acceleration

Machine learning at the network edge: A survey

A survey of techniques for optimizing transformer inference

Spatten: Efficient sparse attention architecture with cascade token and head pruning

Hawq-v3: Dyadic neural network quantization

Olive: Accelerating large language models via hardware-friendly outlier-victim pair quantization

Fact: Ffn-attention co-optimized transformer architecture with eager correlation prediction

Ant: Exploiting adaptive numerical data type for low-bit deep neural network quantization

[PDF][PDF] Optimizing Selective Protection for CNN Resilience.

Unleashing the Potential of Spiking Neural Networks with Dynamic Confidence

Energon: Toward efficient acceleration of transformers using dynamic sparse attention