Vision-language models for vision tasks: A survey

J Zhang, J Huang, S **, S Lu - IEEE Transactions on Pattern …, 2024‏ - ieeexplore.ieee.org
Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks
(DNNs) training, and they usually train a DNN for each single visual recognition task …

Yolo-world: Real-time open-vocabulary object detection

T Cheng, L Song, Y Ge, W Liu… - Proceedings of the …, 2024‏ - openaccess.thecvf.com
Abstract The You Only Look Once (YOLO) series of detectors have established themselves
as efficient and practical tools. However their reliance on predefined and trained object …

Aligning bag of regions for open-vocabulary object detection

S Wu, W Zhang, S **, W Liu… - Proceedings of the IEEE …, 2023‏ - openaccess.thecvf.com
Pre-trained vision-language models (VLMs) learn to align vision and language
representations on large-scale datasets, where each image-text pair usually contains a bag …

Codet: Co-occurrence guided region-word alignment for open-vocabulary object detection

C Ma, Y Jiang, X Wen, Z Yuan… - Advances in neural …, 2024‏ - proceedings.neurips.cc
Deriving reliable region-word alignment from image-text pairs is critical to learnobject-level
vision-language representations for open-vocabulary object detection. Existing methods …

Towards open vocabulary learning: A survey

J Wu, X Li, S Xu, H Yuan, H Ding… - … on Pattern Analysis …, 2024‏ - ieeexplore.ieee.org
In the field of visual scene understanding, deep neural networks have made impressive
advancements in various core tasks like segmentation, tracking, and detection. However …

Edadet: Open-vocabulary object detection using early dense alignment

C Shi, S Yang - Proceedings of the IEEE/CVF international …, 2023‏ - openaccess.thecvf.com
Vision-language models such as CLIP have boosted the performance of open-vocabulary
object detection, where the detector is trained on base categories but required to detect …

A survey on open-vocabulary detection and segmentation: Past, present, and future

C Zhu, L Chen - IEEE Transactions on Pattern Analysis and …, 2024‏ - ieeexplore.ieee.org
As the most fundamental scene understanding tasks, object detection and segmentation
have made tremendous progress in deep learning era. Due to the expensive manual …

General object foundation model for images and videos at scale

J Wu, Y Jiang, Q Liu, Z Yuan… - Proceedings of the …, 2024‏ - openaccess.thecvf.com
We present GLEE in this work an object-level foundation model for locating and identifying
objects in images and videos. Through a unified framework GLEEaccomplishes detection …

Zero-shot referring image segmentation with global-local context features

S Yu, PH Seo, J Son - … of the IEEE/CVF Conference on …, 2023‏ - openaccess.thecvf.com
Referring image segmentation (RIS) aims to find a segmentation mask given a referring
expression grounded to a region of the input image. Collecting labelled datasets for this …

Going denser with open-vocabulary part segmentation

P Sun, S Chen, C Zhu, F **ao, P Luo… - Proceedings of the …, 2023‏ - openaccess.thecvf.com
Object detection has been expanded from a limited number of categories to open
vocabulary. Moving forward, a complete intelligent vision system requires understanding …