Spvit: Enabling faster vision transformers via latency-aware soft token pruning

Z Kong, P Dong, X Ma, X Meng, W Niu, M Sun… - European conference on …, 2022‏ - Springer
Abstract Recently, Vision Transformer (ViT) has continuously established new milestones in
the computer vision field, while the high computation and memory cost makes its …

Chex: Channel exploration for cnn model compression

Z Hou, M Qin, F Sun, X Ma, K Yuan… - Proceedings of the …, 2022‏ - openaccess.thecvf.com
Channel pruning has been broadly recognized as an effective technique to reduce the
computation and memory cost of deep convolutional neural networks. However …

Patdnn: Achieving real-time dnn execution on mobile devices with pattern-based weight pruning

W Niu, X Ma, S Lin, S Wang, X Qian, X Lin… - Proceedings of the …, 2020‏ - dl.acm.org
With the emergence of a spectrum of high-end mobile devices, many applications that
formerly required desktop-level computation capability are being transferred to these …

Advancing model pruning via bi-level optimization

Y Zhang, Y Yao, P Ram, P Zhao… - Advances in …, 2022‏ - proceedings.neurips.cc
The deployment constraints in practical applications necessitate the pruning of large-scale
deep learning models, ie, promoting their weight sparsity. As illustrated by the Lottery Ticket …

Pconv: The missing but desirable sparsity in dnn weight pruning for real-time execution on mobile devices

X Ma, FM Guo, W Niu, X Lin, J Tang, K Ma… - Proceedings of the …, 2020‏ - ojs.aaai.org
Abstract Model compression techniques on Deep Neural Network (DNN) have been widely
acknowledged as an effective way to achieve acceleration on a variety of platforms, and …

CAP-RAM: A charge-domain in-memory computing 6T-SRAM for accurate and precision-programmable CNN inference

Z Chen, Z Yu, Q **, Y He, J Wang, S Lin… - IEEE Journal of Solid …, 2021‏ - ieeexplore.ieee.org
A compact, accurate, and bitwidth-programmable in-memory computing (IMC) static random-
access memory (SRAM) macro, named CAP-RAM, is presented for energy-efficient …

Film-qnn: Efficient fpga acceleration of deep neural networks with intra-layer, mixed-precision quantization

M Sun, Z Li, A Lu, Y Li, SE Chang, X Ma, X Lin… - Proceedings of the …, 2022‏ - dl.acm.org
With the trend to deploy Deep Neural Network (DNN) inference models on edge devices
with limited resources, quantization techniques have been widely used to reduce on-chip …

Mix and match: A novel fpga-centric deep neural network quantization framework

SE Chang, Y Li, M Sun, R Shi, HKH So… - … Symposium on High …, 2021‏ - ieeexplore.ieee.org
Deep Neural Networks (DNNs) have achieved extraordinary performance in various
application domains. To support diverse DNN models, efficient implementations of DNN …