Deep learning--based text classification: a comprehensive review

S Minaee, N Kalchbrenner, E Cambria… - ACM computing …, 2021 - dl.acm.org
Deep learning--based models have surpassed classical machine learning--based
approaches in various text classification tasks, including sentiment analysis, news …

Surgical fine-tuning improves adaptation to distribution shifts

Y Lee, AS Chen, F Tajwar, A Kumar, H Yao… - arxiv preprint arxiv …, 2022 - arxiv.org
A common approach to transfer learning under distribution shift is to fine-tune the last few
layers of a pre-trained model, preserving learned features while also adapting to the new …

A survey on vision transformer

K Han, Y Wang, H Chen, X Chen, J Guo… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …

A survey on visual transformer

K Han, Y Wang, H Chen, X Chen, J Guo, Z Liu… - arxiv preprint arxiv …, 2020 - arxiv.org
Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …

Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers

W Wang, F Wei, L Dong, H Bao… - Advances in Neural …, 2020 - proceedings.neurips.cc
Pre-trained language models (eg, BERT (Devlin et al., 2018) and its variants) have achieved
remarkable success in varieties of NLP tasks. However, these models usually consist of …

Source-free domain adaptation for semantic segmentation

Y Liu, W Zhang, J Wang - … of the IEEE/CVF Conference on …, 2021 - openaccess.thecvf.com
Abstract Unsupervised Domain Adaptation (UDA) can tackle the challenge that
convolutional neural network (CNN)-based approaches for semantic segmentation heavily …

The lottery ticket hypothesis for pre-trained bert networks

T Chen, J Frankle, S Chang, S Liu… - Advances in neural …, 2020 - proceedings.neurips.cc
In natural language processing (NLP), enormous pre-trained models like BERT have
become the standard starting point for training on a range of downstream tasks, and similar …

A survey on transformer compression

Y Tang, Y Wang, J Guo, Z Tu, K Han, H Hu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large models based on the Transformer architecture play increasingly vital roles in artificial
intelligence, particularly within the realms of natural language processing (NLP) and …

Minilmv2: Multi-head self-attention relation distillation for compressing pretrained transformers

W Wang, H Bao, S Huang, L Dong, F Wei - arxiv preprint arxiv …, 2020 - arxiv.org
We generalize deep self-attention distillation in MiniLM (Wang et al., 2020) by only using self-
attention relation distillation for task-agnostic compression of pretrained Transformers. In …

Uncertainty-aware self-training for few-shot text classification

S Mukherjee, A Awadallah - Advances in Neural …, 2020 - proceedings.neurips.cc
Recent success of pre-trained language models crucially hinges on fine-tuning them on
large amounts of labeled data for the downstream task, that are typically expensive to …