Vision-language pre-training: Basics, recent advances, and future trends
This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …
intelligence that have been developed in the last few years. We group these approaches …
A primer on contrastive pretraining in language processing: Methods, lessons learned, and perspectives
Modern natural language processing (NLP) methods employ self-supervised pretraining
objectives such as masked language modeling to boost the performance of various …
objectives such as masked language modeling to boost the performance of various …
Llm-pruner: On the structural pruning of large language models
Large language models (LLMs) have shown remarkable capabilities in language
understanding and generation. However, such impressive capability typically comes with a …
understanding and generation. However, such impressive capability typically comes with a …
Losparse: Structured compression of large language models based on low-rank and sparse approximation
Transformer models have achieved remarkable results in various natural language tasks,
but they are often prohibitively large, requiring massive memories and computational …
but they are often prohibitively large, requiring massive memories and computational …
Compression of generative pre-trained language models via quantization
The increasing size of generative Pre-trained Language Models (PLMs) has greatly
increased the demand for model compression. Despite various methods to compress BERT …
increased the demand for model compression. Despite various methods to compress BERT …
Compressing large-scale transformer-based models: A case study on bert
Pre-trained Transformer-based models have achieved state-of-the-art performance for
various Natural Language Processing (NLP) tasks. However, these models often have …
various Natural Language Processing (NLP) tasks. However, these models often have …
Less is more: Task-aware layer-wise distillation for language model compression
Layer-wise distillation is a powerful tool to compress large models (ie teacher models) into
small ones (ie, student models). The student distills knowledge from the teacher by …
small ones (ie, student models). The student distills knowledge from the teacher by …
Wasserstein contrastive representation distillation
The primary goal of knowledge distillation (KD) is to encapsulate the information of a model
learned from a teacher network into a student network, with the latter being more compact …
learned from a teacher network into a student network, with the latter being more compact …
Not all negatives are equal: Label-aware contrastive loss for fine-grained text classification
Fine-grained classification involves dealing with datasets with larger number of classes with
subtle differences between them. Guiding the model to focus on differentiating dimensions …
subtle differences between them. Guiding the model to focus on differentiating dimensions …
Compressing visual-linguistic model via knowledge distillation
Despite exciting progress in pre-training for visual-linguistic (VL) representations, very few
aspire to a small VL model. In this paper, we study knowledge distillation (KD) to effectively …
aspire to a small VL model. In this paper, we study knowledge distillation (KD) to effectively …