Structured pruning for deep convolutional neural networks: A survey
The remarkable performance of deep Convolutional neural networks (CNNs) is generally
attributed to their deeper and wider architectures, which can come with significant …
attributed to their deeper and wider architectures, which can come with significant …
A comprehensive survey on model compression and acceleration
In recent years, machine learning (ML) and deep learning (DL) have shown remarkable
improvement in computer vision, natural language processing, stock prediction, forecasting …
improvement in computer vision, natural language processing, stock prediction, forecasting …
Flashattention: Fast and memory-efficient exact attention with io-awareness
Transformers are slow and memory-hungry on long sequences, since the time and memory
complexity of self-attention are quadratic in sequence length. Approximate attention …
complexity of self-attention are quadratic in sequence length. Approximate attention …
On-device training under 256kb memory
On-device training enables the model to adapt to new data collected from the sensors by
fine-tuning a pre-trained model. Users can benefit from customized AI models without having …
fine-tuning a pre-trained model. Users can benefit from customized AI models without having …
A-vit: Adaptive tokens for efficient vision transformer
We introduce A-ViT, a method that adaptively adjusts the inference cost of vision transformer
ViT for images of different complexity. A-ViT achieves this by automatically reducing the …
ViT for images of different complexity. A-ViT achieves this by automatically reducing the …
Dynamic neural networks: A survey
Dynamic neural network is an emerging research topic in deep learning. Compared to static
models which have fixed computational graphs and parameters at the inference stage …
models which have fixed computational graphs and parameters at the inference stage …
Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks
The growing energy and performance costs of deep learning have driven the community to
reduce the size of neural networks by selectively pruning components. Similarly to their …
reduce the size of neural networks by selectively pruning components. Similarly to their …
Pruning and quantization for deep neural network acceleration: A survey
Deep neural networks have been applied in many applications exhibiting extraordinary
abilities in the field of computer vision. However, complex network architectures challenge …
abilities in the field of computer vision. However, complex network architectures challenge …
Adavit: Adaptive vision transformers for efficient image recognition
Built on top of self-attention mechanisms, vision transformers have demonstrated
remarkable performance on a variety of vision tasks recently. While achieving excellent …
remarkable performance on a variety of vision tasks recently. While achieving excellent …
Mcunet: Tiny deep learning on iot devices
Abstract Machine learning on tiny IoT devices based on microcontroller units (MCU) is
appealing but challenging: the memory of microcontrollers is 2-3 orders of magnitude …
appealing but challenging: the memory of microcontrollers is 2-3 orders of magnitude …