- Academic Search

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022 - nowpublishers.com

This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …

Speichern Zitieren Zitiert von: 199 Ähnliche Artikel Alle 7 Versionen Bibliothekssuche HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A primer on contrastive pretraining in language processing: Methods, lessons learned, and perspectives

N Rethmeier, I Augenstein - ACM Computing Surveys, 2023 - dl.acm.org

Modern natural language processing (NLP) methods employ self-supervised pretraining
objectives such as masked language modeling to boost the performance of various …

Speichern Zitieren Zitiert von: 98 Ähnliche Artikel Alle 4 Versionen

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Llm-pruner: On the structural pruning of large language models

X Ma, G Fang, X Wang - Advances in neural information …, 2023 - proceedings.neurips.cc

Large language models (LLMs) have shown remarkable capabilities in language
understanding and generation. However, such impressive capability typically comes with a …

Speichern Zitieren Zitiert von: 504 Ähnliche Artikel Alle 5 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Losparse: Structured compression of large language models based on low-rank and sparse approximation

Y Li, Y Yu, Q Zhang, C Liang, P He… - International …, 2023 - proceedings.mlr.press

Transformer models have achieved remarkable results in various natural language tasks,
but they are often prohibitively large, requiring massive memories and computational …

Speichern Zitieren Zitiert von: 76 Ähnliche Artikel Alle 7 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Compression of generative pre-trained language models via quantization

C Tao, L Hou, W Zhang, L Shang, X Jiang, Q Liu… - arxiv preprint arxiv …, 2022 - arxiv.org

The increasing size of generative Pre-trained Language Models (PLMs) has greatly
increased the demand for model compression. Despite various methods to compress BERT …

Speichern Zitieren Zitiert von: 93 Ähnliche Artikel Alle 6 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] mit.edu

Compressing large-scale transformer-based models: A case study on bert

P Ganesh, Y Chen, X Lou, MA Khan, Y Yang… - Transactions of the …, 2021 - direct.mit.edu

Pre-trained Transformer-based models have achieved state-of-the-art performance for
various Natural Language Processing (NLP) tasks. However, these models often have …

Speichern Zitieren Zitiert von: 225 Ähnliche Artikel Alle 14 Versionen

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Less is more: Task-aware layer-wise distillation for language model compression

C Liang, S Zuo, Q Zhang, P He… - … on Machine Learning, 2023 - proceedings.mlr.press

Layer-wise distillation is a powerful tool to compress large models (ie teacher models) into
small ones (ie, student models). The student distills knowledge from the teacher by …

Speichern Zitieren Zitiert von: 74 Ähnliche Artikel Alle 7 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Wasserstein contrastive representation distillation

L Chen, D Wang, Z Gan, J Liu… - Proceedings of the …, 2021 - openaccess.thecvf.com

The primary goal of knowledge distillation (KD) is to encapsulate the information of a model
learned from a teacher network into a student network, with the latter being more compact …

Speichern Zitieren Zitiert von: 120 Ähnliche Artikel Alle 11 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Not all negatives are equal: Label-aware contrastive loss for fine-grained text classification

V Suresh, DC Ong - arxiv preprint arxiv:2109.05427, 2021 - arxiv.org

Fine-grained classification involves dealing with datasets with larger number of classes with
subtle differences between them. Guiding the model to focus on differentiating dimensions …

Speichern Zitieren Zitiert von: 95 Ähnliche Artikel Alle 4 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Compressing visual-linguistic model via knowledge distillation

Z Fang, J Wang, X Hu, L Wang… - Proceedings of the …, 2021 - openaccess.thecvf.com

Despite exciting progress in pre-training for visual-linguistic (VL) representations, very few
aspire to a small VL model. In this paper, we study knowledge distillation (KD) to effectively …

Speichern Zitieren Zitiert von: 97 Ähnliche Artikel Alle 6 Versionen HTML-Version

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

Contrastive distillation on intermediate representations for language model compression

Vision-language pre-training: Basics, recent advances, and future trends

A primer on contrastive pretraining in language processing: Methods, lessons learned, and perspectives

Llm-pruner: On the structural pruning of large language models

Losparse: Structured compression of large language models based on low-rank and sparse approximation

Compression of generative pre-trained language models via quantization

Compressing large-scale transformer-based models: A case study on bert

Less is more: Task-aware layer-wise distillation for language model compression

Wasserstein contrastive representation distillation

Not all negatives are equal: Label-aware contrastive loss for fine-grained text classification

Compressing visual-linguistic model via knowledge distillation