Ammus: A survey of transformer-based pretrained models in natural language processing

KS Kalyan, A Rajasekharan, S Sangeetha - arxiv preprint arxiv …, 2021 - arxiv.org
Transformer-based pretrained language models (T-PTLMs) have achieved great success in
almost every NLP task. The evolution of these models started with GPT and BERT. These …

Dawn of the transformer era in speech emotion recognition: closing the valence gap

J Wagner, A Triantafyllopoulos… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Recent advances in transformer-based architectures have shown promise in several
machine learning tasks. In the audio domain, such architectures have been successfully …

Structured pruning learns compact and accurate models

M **a, Z Zhong, D Chen - arxiv preprint arxiv:2204.00408, 2022 - arxiv.org
The growing size of neural language models has led to increased attention in model
compression. The two predominant approaches are pruning, which gradually removes …

Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers

W Wang, F Wei, L Dong, H Bao… - Advances in neural …, 2020 - proceedings.neurips.cc
Pre-trained language models (eg, BERT (Devlin et al., 2018) and its variants) have achieved
remarkable success in varieties of NLP tasks. However, these models usually consist of …

Tinybert: Distilling bert for natural language understanding

X Jiao, Y Yin, L Shang, X Jiang, X Chen, L Li… - arxiv preprint arxiv …, 2019 - arxiv.org
Language model pre-training, such as BERT, has significantly improved the performances of
many natural language processing tasks. However, pre-trained language models are …

Block pruning for faster transformers

F Lagunas, E Charlaix, V Sanh, AM Rush - arxiv preprint arxiv …, 2021 - arxiv.org
Pre-training has improved model accuracy for both classification and generation tasks at the
cost of introducing much larger and slower models. Pruning methods have proven to be an …

Movement pruning: Adaptive sparsity by fine-tuning

V Sanh, T Wolf, A Rush - Advances in neural information …, 2020 - proceedings.neurips.cc
Magnitude pruning is a widely used strategy for reducing model size in pure supervised
learning; however, it is less effective in the transfer learning regime that has become …

Parameter-efficient transfer learning with diff pruning

D Guo, AM Rush, Y Kim - arxiv preprint arxiv:2012.07463, 2020 - arxiv.org
While task-specific finetuning of pretrained networks has led to significant empirical
advances in NLP, the large size of networks makes finetuning difficult to deploy in multi-task …

Full stack optimization of transformer inference: a survey

S Kim, C Hooper, T Wattanawong, M Kang… - arxiv preprint arxiv …, 2023 - arxiv.org
Recent advances in state-of-the-art DNN architecture design have been moving toward
Transformer models. These models achieve superior accuracy across a wide range of …

The optimal bert surgeon: Scalable and accurate second-order pruning for large language models

E Kurtic, D Campos, T Nguyen, E Frantar… - arxiv preprint arxiv …, 2022 - arxiv.org
Transformer-based language models have become a key building block for natural
language processing. While these models are extremely accurate, they can be too large and …