Vision-language pre-training: Basics, recent advances, and future trends

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022 - nowpublishers.com
This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …

Digital twin in the IoT context: A survey on technical features, scenarios, and architectural models

R Minerva, GM Lee, N Crespi - Proceedings of the IEEE, 2020 - ieeexplore.ieee.org
Digital twin (DT) is an emerging concept that is gaining attention in various industries. It
refers to the ability to clone a physical object (PO) into a software counterpart. The …

Open-vocabulary object detection via vision and language knowledge distillation

X Gu, TY Lin, W Kuo, Y Cui - arxiv preprint arxiv:2104.13921, 2021 - arxiv.org
We aim at advancing open-vocabulary object detection, which detects objects described by
arbitrary text inputs. The fundamental challenge is the availability of training data. It is costly …

Learning concise and descriptive attributes for visual recognition

A Yan, Y Wang, Y Zhong, C Dong… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent advances in foundation models present new opportunities for interpretable visual
recognition--one can first query Large Language Models (LLMs) to obtain a set of attributes …

Align and prompt: Video-and-language pre-training with entity prompts

D Li, J Li, H Li, JC Niebles… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Video-and-language pre-training has shown promising improvements on various
downstream tasks. Most previous methods capture cross-modal interactions with a …

Elevater: A benchmark and toolkit for evaluating language-augmented visual models

C Li, H Liu, L Li, P Zhang, J Aneja… - Advances in …, 2022 - proceedings.neurips.cc
Learning visual representations from natural language supervision has recently shown great
promise in a number of pioneering works. In general, these language-augmented visual …

Contrastive embedding for generalized zero-shot learning

Z Han, Z Fu, S Chen, J Yang - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
Generalized zero-shot learning (GZSL) aims to recognize objects from both seen and
unseen classes, when only the labeled examples from seen classes are provided. Recent …

A review of generalized zero-shot learning methods

F Pourpanah, M Abdar, Y Luo, X Zhou… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Generalized zero-shot learning (GZSL) aims to train a model for classifying data samples
under the condition that some output classes are unknown during supervised learning. To …

Progressive semantic-visual mutual adaption for generalized zero-shot learning

M Liu, F Li, C Zhang, Y Wei, H Bai… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Generalized Zero-Shot Learning (GZSL) identifies unseen categories by knowledge
transferred from the seen domain, relying on the intrinsic interactions between visual and …

Counterfactual zero-shot and open-set visual recognition

Z Yue, T Wang, Q Sun, XS Hua… - Proceedings of the …, 2021 - openaccess.thecvf.com
We present a novel counterfactual framework for both Zero-Shot Learning (ZSL) and Open-
Set Recognition (OSR), whose common challenge is generalizing to the unseen-classes by …