Vision-language pre-training: Basics, recent advances, and future trends
This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …
intelligence that have been developed in the last few years. We group these approaches …
A review of generalized zero-shot learning methods
Generalized zero-shot learning (GZSL) aims to train a model for classifying data samples
under the condition that some output classes are unknown during supervised learning. To …
under the condition that some output classes are unknown during supervised learning. To …
Open-vocabulary object detection via vision and language knowledge distillation
We aim at advancing open-vocabulary object detection, which detects objects described by
arbitrary text inputs. The fundamental challenge is the availability of training data. It is costly …
arbitrary text inputs. The fundamental challenge is the availability of training data. It is costly …
Decoupling zero-shot semantic segmentation
Zero-shot semantic segmentation (ZS3) aims to segment the novel categories that have not
been seen in the training. Existing works formulate ZS3 as a pixel-level zero-shot …
been seen in the training. Existing works formulate ZS3 as a pixel-level zero-shot …
Elevater: A benchmark and toolkit for evaluating language-augmented visual models
Learning visual representations from natural language supervision has recently shown great
promise in a number of pioneering works. In general, these language-augmented visual …
promise in a number of pioneering works. In general, these language-augmented visual …
Promptdet: Towards open-vocabulary detection using uncurated images
The goal of this work is to establish a scalable pipeline for expanding an object detector
towards novel/unseen categories, using zero manual annotations. To achieve that, we make …
towards novel/unseen categories, using zero manual annotations. To achieve that, we make …
A survey of zero-shot learning: Settings, methods, and applications
Most machine-learning methods focus on classifying instances whose classes have already
been seen in training. In practice, many applications require classifying instances whose …
been seen in training. In practice, many applications require classifying instances whose …
Krisp: Integrating implicit and symbolic knowledge for open-domain knowledge-based vqa
One of the most challenging question types in VQA is when answering the question requires
outside knowledge not present in the image. In this work we study open-domain knowledge …
outside knowledge not present in the image. In this work we study open-domain knowledge …
Feature generating networks for zero-shot learning
Suffering from the extreme training data imbalance between seen and unseen classes, most
of existing state-of-the-art approaches fail to achieve satisfactory results for the challenging …
of existing state-of-the-art approaches fail to achieve satisfactory results for the challenging …
Zero-shot learning—a comprehensive evaluation of the good, the bad and the ugly
Due to the importance of zero-shot learning, ie, classifying images where there is a lack of
labeled training data, the number of proposed approaches has recently increased steadily …
labeled training data, the number of proposed approaches has recently increased steadily …