Make repvgg greater again: A quantization-aware approach
The tradeoff between performance and inference speed is critical for practical applications.
Architecture reparameterization obtains better tradeoffs and it is becoming an increasingly …
Architecture reparameterization obtains better tradeoffs and it is becoming an increasingly …
Eq-net: Elastic quantization neural networks
Current model quantization methods have shown their promising capability in reducing
storage space and computation complexity. However, due to the diversity of quantization …
storage space and computation complexity. However, due to the diversity of quantization …
Fast search of face recognition model for a mobile device based on neural architecture comparator
This paper addresses the face recognition task for offline mobile applications. Using AutoML
techniques, a novel technological framework is proposed to develop a fast neural network …
techniques, a novel technological framework is proposed to develop a fast neural network …
TexQ: zero-shot network quantization with texture feature distribution calibration
Quantization is an effective way to compress neural networks. By reducing the bit width of
the parameters, the processing efficiency of neural network models at edge devices can be …
the parameters, the processing efficiency of neural network models at edge devices can be …
Design automation for fast, lightweight, and effective deep learning models: A survey
Deep learning technologies have demonstrated remarkable effectiveness in a wide range of
tasks, and deep learning holds the potential to advance a multitude of applications …
tasks, and deep learning holds the potential to advance a multitude of applications …
Bitwidth-adaptive quantization-aware neural network training: A meta-learning approach
Deep neural network quantization with adaptive bitwidths has gained increasing attention
due to the ease of model deployment on various platforms with different resource budgets. In …
due to the ease of model deployment on various platforms with different resource budgets. In …
Mix-gemm: An efficient hw-sw architecture for mixed-precision quantized deep neural networks inference on edge devices
Deep Neural Network (DNN) inference based on quantized narrow-precision integer data
represents a promising research direction toward efficient deep learning computations on …
represents a promising research direction toward efficient deep learning computations on …
Mixpath: A unified approach for one-shot neural architecture search
Blending multiple convolutional kernels is proved advantageous in neural architecture
design. However, current two-stage neural architecture search methods are mainly limited to …
design. However, current two-stage neural architecture search methods are mainly limited to …
QuantNAS: Quantization-aware Neural Architecture Search For Efficient Deployment On Mobile Device
T Gao, L Guo, S Zhao, P Xu, Y Yang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Deep convolutional networks are increasingly applied in mobile AI scenarios. To achieve
efficient deployment researchers combine neural architecture search (NAS) and …
efficient deployment researchers combine neural architecture search (NAS) and …
Spaceevo: Hardware-friendly search space design for efficient int8 inference
Abstract The combination of Neural Architecture Search (NAS) and quantization has proven
successful in automatically designing low-FLOPs INT8 quantized neural networks (QNN) …
successful in automatically designing low-FLOPs INT8 quantized neural networks (QNN) …