Deep learning--based text classification: a comprehensive review
Deep learning--based models have surpassed classical machine learning--based
approaches in various text classification tasks, including sentiment analysis, news …
approaches in various text classification tasks, including sentiment analysis, news …
Surgical fine-tuning improves adaptation to distribution shifts
A common approach to transfer learning under distribution shift is to fine-tune the last few
layers of a pre-trained model, preserving learned features while also adapting to the new …
layers of a pre-trained model, preserving learned features while also adapting to the new …
A survey on vision transformer
Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …
network mainly based on the self-attention mechanism. Thanks to its strong representation …
A survey on visual transformer
Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …
network mainly based on the self-attention mechanism. Thanks to its strong representation …
Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers
Pre-trained language models (eg, BERT (Devlin et al., 2018) and its variants) have achieved
remarkable success in varieties of NLP tasks. However, these models usually consist of …
remarkable success in varieties of NLP tasks. However, these models usually consist of …
Source-free domain adaptation for semantic segmentation
Abstract Unsupervised Domain Adaptation (UDA) can tackle the challenge that
convolutional neural network (CNN)-based approaches for semantic segmentation heavily …
convolutional neural network (CNN)-based approaches for semantic segmentation heavily …
The lottery ticket hypothesis for pre-trained bert networks
In natural language processing (NLP), enormous pre-trained models like BERT have
become the standard starting point for training on a range of downstream tasks, and similar …
become the standard starting point for training on a range of downstream tasks, and similar …
A survey on transformer compression
Large models based on the Transformer architecture play increasingly vital roles in artificial
intelligence, particularly within the realms of natural language processing (NLP) and …
intelligence, particularly within the realms of natural language processing (NLP) and …
Minilmv2: Multi-head self-attention relation distillation for compressing pretrained transformers
We generalize deep self-attention distillation in MiniLM (Wang et al., 2020) by only using self-
attention relation distillation for task-agnostic compression of pretrained Transformers. In …
attention relation distillation for task-agnostic compression of pretrained Transformers. In …
Uncertainty-aware self-training for few-shot text classification
Recent success of pre-trained language models crucially hinges on fine-tuning them on
large amounts of labeled data for the downstream task, that are typically expensive to …
large amounts of labeled data for the downstream task, that are typically expensive to …