Pre-trained models for natural language processing: A survey

X Qiu, T Sun, Y Xu, Y Shao, N Dai, X Huang - Science China …, 2020‏ - Springer
Recently, the emergence of pre-trained models (PTMs) has brought natural language
processing (NLP) to a new era. In this survey, we provide a comprehensive review of PTMs …

Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks

L Wang, KJ Yoon - IEEE transactions on pattern analysis and …, 2021‏ - ieeexplore.ieee.org
Deep neural models, in recent years, have been successful in almost every field, even
solving the most complex problem statements. However, these models are huge in size with …

Fnet: Mixing tokens with fourier transforms

J Lee-Thorp, J Ainslie, I Eckstein, S Ontanon - arxiv preprint arxiv …, 2021‏ - arxiv.org
We show that Transformer encoder architectures can be sped up, with limited accuracy
costs, by replacing the self-attention sublayers with simple linear transformations that" mix" …

Structured pruning learns compact and accurate models

M **a, Z Zhong, D Chen - arxiv preprint arxiv:2204.00408, 2022‏ - arxiv.org
The growing size of neural language models has led to increased attention in model
compression. The two predominant approaches are pruning, which gradually removes …

A survey on vision transformer

K Han, Y Wang, H Chen, X Chen, J Guo… - IEEE transactions on …, 2022‏ - ieeexplore.ieee.org
Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …

DiffCSE: Difference-based contrastive learning for sentence embeddings

YS Chuang, R Dangovski, H Luo, Y Zhang… - arxiv preprint arxiv …, 2022‏ - arxiv.org
We propose DiffCSE, an unsupervised contrastive learning framework for learning sentence
embeddings. DiffCSE learns sentence embeddings that are sensitive to the difference …

Knowledge distillation: A survey

J Gou, B Yu, SJ Maybank, D Tao - International Journal of Computer Vision, 2021‏ - Springer
In recent years, deep neural networks have been successful in both industry and academia,
especially for computer vision tasks. The great success of deep learning is mainly due to its …