Neural architecture search survey: A computer vision perspective

JS Kang, JK Kang, JJ Kim, KW Jeon, HJ Chung… - Sensors, 2023 - mdpi.com
In recent years, deep learning (DL) has been widely studied using various methods across
the globe, especially with respect to training methods and network structures, proving highly …

Transforming large-size to lightweight deep neural networks for IoT applications

R Mishra, H Gupta - ACM Computing Surveys, 2023 - dl.acm.org
Deep Neural Networks (DNNs) have gained unprecedented popularity due to their high-
order performance and automated feature extraction capability. This has encouraged …

Memorization without overfitting: Analyzing the training dynamics of large language models

K Tirumala, A Markosyan… - Advances in …, 2022 - proceedings.neurips.cc
Despite their wide adoption, the underlying training and memorization dynamics of very
large language models is not well understood. We empirically study exact memorization in …

Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks

T Hoefler, D Alistarh, T Ben-Nun, N Dryden… - Journal of Machine …, 2021 - jmlr.org
The growing energy and performance costs of deep learning have driven the community to
reduce the size of neural networks by selectively pruning components. Similarly to their …

The lottery ticket hypothesis for pre-trained bert networks

T Chen, J Frankle, S Chang, S Liu… - Advances in neural …, 2020 - proceedings.neurips.cc
In natural language processing (NLP), enormous pre-trained models like BERT have
become the standard starting point for training on a range of downstream tasks, and similar …

Linear mode connectivity and the lottery ticket hypothesis

J Frankle, GK Dziugaite, D Roy… - … on Machine Learning, 2020 - proceedings.mlr.press
We study whether a neural network optimizes to the same, linearly connected minimum
under different samples of SGD noise (eg, random data order and augmentation). We find …

Where to begin? on the impact of pre-training and initialization in federated learning

J Nguyen, J Wang, K Malik, M Sanjabi… - ar**
H Liang, S Zhang, J Sun, X He, W Huang… - arxiv preprint arxiv …, 2019 - arxiv.org
Recently, there has been a growing interest in automating the process of neural architecture
design, and the Differentiable Architecture Search (DARTS) method makes the process …

Accelerating dataset distillation via model augmentation

L Zhang, J Zhang, B Lei, S Mukherjee… - Proceedings of the …, 2023 - openaccess.thecvf.com
Dataset Distillation (DD), a newly emerging field, aims at generating much smaller but
efficient synthetic training datasets from large ones. Existing DD methods based on gradient …

Understanding the role of training regimes in continual learning

SI Mirzadeh, M Farajtabar, R Pascanu… - Advances in …, 2020 - proceedings.neurips.cc
Catastrophic forgetting affects the training of neural networks, limiting their ability to learn
multiple tasks sequentially. From the perspective of the well established plasticity-stability …