Diswot: Student architecture search for distillation without training

P Dong, L Li, Z Wei - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
Abstract Knowledge distillation (KD) is an effective training strategy to improve the
lightweight student models under the guidance of cumbersome teachers. However, the large …

Automated knowledge distillation via monte carlo tree search

L Li, P Dong, Z Wei, Y Yang - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
In this paper, we present Auto-KD, the first automated search framework for optimal
knowledge distillation design. Traditional distillation techniques typically require handcrafted …

Emq: Evolving training-free proxies for automated mixed precision quantization

P Dong, L Li, Z Wei, X Niu, Z Tian… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Mixed-Precision Quantization (MQ) can achieve a competitive accuracy-complexity
trade-off for models. Conventional training-based search methods require time-consuming …

Kd-zero: Evolving knowledge distiller for any teacher-student pairs

L Li, P Dong, A Li, Z Wei… - Advances in Neural …, 2023 - proceedings.neurips.cc
Abstract Knowledge distillation (KD) has emerged as an effective technique for compressing
models that can enhance the lightweight model. Conventional KD methods propose various …

Saswot: Real-time semantic segmentation architecture search without training

C Zhu, L Li, Y Wu, Z Sun - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org
In this paper, we present SasWOT, the first training-free Semantic segmentation Architecture
Search (SAS) framework via an auto-discovery proxy. Semantic segmentation is widely used …

Pruner-zero: Evolving symbolic pruning metric from scratch for large language models

P Dong, L Li, Z Tang, X Liu, X Pan, Q Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Despite the remarkable capabilities, Large Language Models (LLMs) face deployment
challenges due to their extensive size. Pruning methods drop a subset of weights to …

Auto-prox: Training-free vision transformer architecture search via automatic proxy discovery

Z Wei, P Dong, Z Hui, A Li, L Li, M Lu, H Pan… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
The substantial success of Vision Transformer (ViT) in computer vision tasks is largely
attributed to the architecture design. This underscores the necessity of efficient architecture …

Detkds: Knowledge distillation search for object detectors

L Li, Y Bao, P Dong, C Yang, A Li, W Luo… - … on Machine Learning, 2024 - openreview.net
In this paper, we present DetKDS, the first framework that searches for optimal detection
distillation policies. Manual design of detection distillers becomes challenging and time …

Amd: Automatic multi-step distillation of large-scale vision models

C Han, Q Wang, SA Dianat, M Rabbani… - … on Computer Vision, 2024 - Springer
Transformer-based architectures have become the de-facto standard models for diverse
vision tasks owing to their superior performance. As the size of these transformer-based …

Auto-GAS: automated proxy discovery for training-free generative architecture search

L Li, H Sun, S Li, P Dong, W Luo, W Xue, Q Liu… - … on Computer Vision, 2024 - Springer
In this paper, we introduce Auto-GAS, the first training-free Generative Architecture Search
(GAS) framework enabled by an auto-discovered proxy. Generative models like Generative …