Google 학술 검색

S Minaee, N Kalchbrenner, E Cambria… - ACM computing …, 2021 - dl.acm.org

Deep learning--based models have surpassed classical machine learning--based
approaches in various text classification tasks, including sentiment analysis, news …

저장 인용 1975회 인용 관련 학술자료 전체 10개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Surgical fine-tuning improves adaptation to distribution shifts

Y Lee, AS Chen, F Tajwar, A Kumar, H Yao… - arxiv preprint arxiv …, 2022 - arxiv.org

A common approach to transfer learning under distribution shift is to fine-tune the last few
layers of a pre-trained model, preserving learned features while also adapting to the new …

저장 인용 210회 인용 관련 학술자료 전체 5개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] baai.ac.cn

A survey on vision transformer

K Han, Y Wang, H Chen, X Chen, J Guo… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …

저장 인용 2703회 인용 관련 학술자료 전체 7개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey on visual transformer

K Han, Y Wang, H Chen, X Chen, J Guo, Z Liu… - arxiv preprint arxiv …, 2020 - arxiv.org

Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …

저장 인용 391회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers

W Wang, F Wei, L Dong, H Bao… - Advances in Neural …, 2020 - proceedings.neurips.cc

Pre-trained language models (eg, BERT (Devlin et al., 2018) and its variants) have achieved
remarkable success in varieties of NLP tasks. However, these models usually consist of …

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Source-free domain adaptation for semantic segmentation

Y Liu, W Zhang, J Wang - … of the IEEE/CVF Conference on …, 2021 - openaccess.thecvf.com

Abstract Unsupervised Domain Adaptation (UDA) can tackle the challenge that
convolutional neural network (CNN)-based approaches for semantic segmentation heavily …

저장 인용 307회 인용 관련 학술자료 전체 6개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

The lottery ticket hypothesis for pre-trained bert networks

T Chen, J Frankle, S Chang, S Liu… - Advances in neural …, 2020 - proceedings.neurips.cc

In natural language processing (NLP), enormous pre-trained models like BERT have
become the standard starting point for training on a range of downstream tasks, and similar …

저장 인용 408회 인용 관련 학술자료 전체 9개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey on transformer compression

Y Tang, Y Wang, J Guo, Z Tu, K Han, H Hu… - arxiv preprint arxiv …, 2024 - arxiv.org

Large models based on the Transformer architecture play increasingly vital roles in artificial
intelligence, particularly within the realms of natural language processing (NLP) and …

저장 인용 28회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Minilmv2: Multi-head self-attention relation distillation for compressing pretrained transformers

W Wang, H Bao, S Huang, L Dong, F Wei - arxiv preprint arxiv …, 2020 - arxiv.org

We generalize deep self-attention distillation in MiniLM (Wang et al., 2020) by only using self-
attention relation distillation for task-agnostic compression of pretrained Transformers. In …

저장 인용 234회 인용 관련 학술자료 전체 6개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Uncertainty-aware self-training for few-shot text classification

S Mukherjee, A Awadallah - Advances in Neural …, 2020 - proceedings.neurips.cc

Recent success of pre-trained language models crucially hinges on fine-tuning them on
large amounts of labeled data for the downstream task, that are typically expensive to …

저장 인용 170회 인용 관련 학술자료 전체 5개의 버전 HTML 버전

인용

고급 검색

라이브러리에 저장됨

Deep learning--based text classification: a comprehensive review

Surgical fine-tuning improves adaptation to distribution shifts

A survey on vision transformer

A survey on visual transformer

Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers

Source-free domain adaptation for semantic segmentation

The lottery ticket hypothesis for pre-trained bert networks

A survey on transformer compression

Minilmv2: Multi-head self-attention relation distillation for compressing pretrained transformers

Uncertainty-aware self-training for few-shot text classification