Survey on evolutionary deep learning: Principles, algorithms, applications, and open issues

N Li, L Ma, G Yu, B Xue, M Zhang, Y ** - ACM Computing Surveys, 2023 - dl.acm.org
Over recent years, there has been a rapid development of deep learning (DL) in both
industry and academia fields. However, finding the optimal hyperparameters of a DL model …

[PDF][PDF] DPAL-BERT: A Faster and Lighter Question Answering Model.

L Yin, L Wang, Z Cai, S Lu, R Wang… - … in Engineering & …, 2024 - researchgate.net
Recent advancements in natural language processing have given rise to numerous pre-
training language models in question-answering systems. However, with the constant …

Autotinybert: Automatic hyper-parameter optimization for efficient pre-trained language models

Y Yin, C Chen, L Shang, X Jiang, X Chen… - arxiv preprint arxiv …, 2021 - arxiv.org
Pre-trained language models (PLMs) have achieved great success in natural language
processing. Most of PLMs follow the default setting of architecture hyper-parameters (eg, the …

Joint structured pruning and dense knowledge distillation for efficient transformer model compression

B Cui, Y Li, Z Zhang - Neurocomputing, 2021 - Elsevier
In this paper, we develop a novel Joint Model Compression (referred to as JMC) method by
combining structured pruning and dense knowledge distillation techniques to significantly …

A novel multi-layer feature fusion-based BERT-CNN for sentence representation learning and classification

KH Alyoubi, FS Alotaibi, A Kumar, V Gupta… - Robotic Intelligence …, 2023 - emerald.com
Purpose The purpose of this paper is to describe a new approach to sentence
representation learning leading to text classification using Bidirectional Encoder …

F-divergence minimization for sequence-level knowledge distillation

Y Wen, Z Li, W Du, L Mou - arxiv preprint arxiv:2307.15190, 2023 - arxiv.org
Knowledge distillation (KD) is the process of transferring knowledge from a large model to a
small one. It has gained increasing attention in the natural language processing community …

A comparative analysis of task-agnostic distillation methods for compressing transformer language models

T Udagawa, A Trivedi, M Merler… - arxiv preprint arxiv …, 2023 - arxiv.org
Large language models have become a vital component in modern NLP, achieving state of
the art performance in a variety of tasks. However, they are often inefficient for real-world …

Research Status and Progress in Evolutionary Deep Learning

L Nan, H Meirui, M Lianbo - Information and Control, 2024 - xk.sia.cn
In recent years, both industry and academia have made significant advances in deep
learning (DL). However, configuring the hyperparameters of deep models typically requires …

BERT-based coupling evaluation of biological strategies in bio-inspired design

F Sun, H Xu, Y Meng, Z Lu, C Gong - Expert Systems with Applications, 2023 - Elsevier
Searching for suitable biological strategies in bio-inspired design (BID) is the first problem
that designers need to solve. Based on the biological strategy database of the AskNature, a …

E-lang: Energy-based joint inferencing of super and swift language models

M Akbari, A Banitalebi-Dehkordi, Y Zhang - arxiv preprint arxiv …, 2022 - arxiv.org
Building huge and highly capable language models has been a trend in the past years.
Despite their great performance, they incur high computational cost. A common solution is to …