Neural architecture search: Insights from 1000 papers

C White, M Safari, R Sukthanker, B Ru, T Elsken… - arxiv preprint arxiv …, 2023 - arxiv.org
In the past decade, advances in deep learning have resulted in breakthroughs in a variety of
areas, including computer vision, natural language understanding, speech recognition, and …

Neural architecture search for transformers: A survey

KT Chitty-Venkata, M Emani, V Vishwanath… - IEEE …, 2022 - ieeexplore.ieee.org
Transformer-based Deep Neural Network architectures have gained tremendous interest
due to their effectiveness in various applications across Natural Language Processing (NLP) …

Evolutionary neural architecture search for transformer in knowledge tracing

S Yang, X Yu, Y Tian, X Yan, H Ma… - Advances in Neural …, 2023 - proceedings.neurips.cc
Abstract Knowledge tracing (KT) aims to trace students' knowledge states by predicting
whether students answer correctly on exercises. Despite the excellent performance of …

Zico: Zero-shot nas via inverse coefficient of variation on gradients

G Li, Y Yang, K Bhardwaj, R Marculescu - arxiv preprint arxiv:2301.11300, 2023 - arxiv.org
Neural Architecture Search (NAS) is widely used to automatically obtain the neural network
with the best performance among a large number of candidate architectures. To reduce the …

Elasticvit: Conflict-aware supernet training for deploying fast vision transformer on diverse mobile devices

C Tang, LL Zhang, H Jiang, J Xu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Neural Architecture Search (NAS) has shown promising performance in the
automatic design of vision transformers (ViT) exceeding 1G FLOPs. However, designing …

Boosting Order-Preserving and Transferability for Neural Architecture Search: a Joint Architecture Refined Search and Fine-tuning Approach

B Zhang, X Wang, X Qin, J Yan - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Supernet is a core component in many recent Neural Architecture Search (NAS) methods. It
not only helps embody the search space but also provides a (relative) estimation of the final …

EMT-NAS: Transferring architectural knowledge between tasks from different datasets

P Liao, Y **, W Du - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
The success of multi-task learning (MTL) can largely be attributed to the shared
representation of related tasks, allowing the models to better generalise. In deep learning …

Pa&da: Jointly sampling path and data for consistent nas

S Lu, Y Hu, L Yang, Z Sun, J Mei… - Proceedings of the …, 2023 - openaccess.thecvf.com
Based on the weight-sharing mechanism, one-shot NAS methods train a supernet and then
inherit the pre-trained weights to evaluate sub-models, largely reducing the search cost …

Femtodet: An object detection baseline for energy versus performance tradeoffs

P Tu, X **e, G Ai, Y Li, Y Huang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Efficient detectors for edge devices are often optimized for parameters or speed count
metrics, which remain in weak correlation with the energy of detectors. However, some …

SKDBERT: compressing BERT via stochastic knowledge distillation

Z Ding, G Jiang, S Zhang, L Guo, W Lin - Proceedings of the AAAI …, 2023 - ojs.aaai.org
In this paper, we propose Stochastic Knowledge Distillation (SKD) to obtain compact BERT-
style language model dubbed SKDBERT. In each distillation iteration, SKD samples a …