Structured pruning for deep convolutional neural networks: A survey

Y He, L **: Distilling clip-based models with a student base for video-language retrieval
R Pei, J Liu, W Li, B Shao, S Xu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Pre-training a vison-language model and then fine-tuning it on downstream tasks have
become a popular paradigm. However, pre-trained vison-language models with the …

Ppt: token-pruned pose transformer for monocular and multi-view human pose estimation

H Ma, Z Wang, Y Chen, D Kong, L Chen, X Liu… - … on Computer Vision, 2022 - Springer
Recently, the vision transformer and its variants have played an increasingly important role
in both monocular and multi-view human pose estimation. Considering image patches as …

A survey on transformer compression

Y Tang, Y Wang, J Guo, Z Tu, K Han, H Hu… - arxiv preprint arxiv …, 2024 - arxiv.org
Transformer plays a vital role in the realms of natural language processing (NLP) and
computer vision (CV), specially for constructing large language models (LLM) and large …

Masked autoencoders enable efficient knowledge distillers

Y Bai, Z Wang, J **ao, C Wei, H Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
This paper studies the potential of distilling knowledge from pre-trained models, especially
Masked Autoencoders. Our approach is simple: in addition to optimizing the pixel …