Fact: Ffn-attention co-optimized transformer architecture with eager correlation prediction
Transformer model is becoming prevalent in various AI applications with its outstanding
performance. However, the high cost of computation and memory footprint make its …
performance. However, the high cost of computation and memory footprint make its …
SOFA: A compute-memory optimized sparsity accelerator via cross-stage coordinated tiling
H Wang, J Fang, X Tang, Z Yue, J Li… - 2024 57th IEEE/ACM …, 2024 - ieeexplore.ieee.org
Benefiting from the self-attention mechanism, Transformer models have attained impressive
contextual comprehension capabilities for lengthy texts. The requirements of high …
contextual comprehension capabilities for lengthy texts. The requirements of high …
SG-Float: Achieving Memory Access and Computing Power Reduction Using Self-Gating Float in CNNs
Convolutional neural networks (CNNs) are essential for advancing the field of artificial
intelligence. However, since these networks are highly demanding in terms of memory and …
intelligence. However, since these networks are highly demanding in terms of memory and …
SySMOL: A Hardware-software Co-design Framework for Ultra-Low and Fine-Grained Mixed-Precision Neural Networks
C Zhou, V Richard, P Savarese, Z Hassman… - arxiv preprint arxiv …, 2023 - arxiv.org
Recent advancements in quantization and mixed-precision techniques offer significant
promise for improving the run-time and energy efficiency of neural networks. In this work, we …
promise for improving the run-time and energy efficiency of neural networks. In this work, we …
BitWave: Exploiting Column-Based Bit-Level Sparsity for Deep Learning Acceleration
Bit-serial computation facilitates bit-wise sequential data processing, offering numerous
benefits, such as a reduced area footprint and dynamically-adaptive computational …
benefits, such as a reduced area footprint and dynamically-adaptive computational …
Pianissimo: A Sub-mW Class DNN Accelerator With Progressively Adjustable Bit-Precision
With the widespread adoption of edge AI, the diversity of application requirements and
fluctuating computational demands present significant challenges. Conventional …
fluctuating computational demands present significant challenges. Conventional …
Progressive Variable Precision DNN With Bitwise Ternary Accumulation
Progressive variable precision networks are capable of adapting to changing computational
needs over time using a single weight set. However, previous works have two problems: 1) …
needs over time using a single weight set. However, previous works have two problems: 1) …