Vision-language models for vision tasks: A survey
Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks
(DNNs) training, and they usually train a DNN for each single visual recognition task …
(DNNs) training, and they usually train a DNN for each single visual recognition task …
Yolo-world: Real-time open-vocabulary object detection
Abstract The You Only Look Once (YOLO) series of detectors have established themselves
as efficient and practical tools. However their reliance on predefined and trained object …
as efficient and practical tools. However their reliance on predefined and trained object …
Aligning bag of regions for open-vocabulary object detection
Pre-trained vision-language models (VLMs) learn to align vision and language
representations on large-scale datasets, where each image-text pair usually contains a bag …
representations on large-scale datasets, where each image-text pair usually contains a bag …
Codet: Co-occurrence guided region-word alignment for open-vocabulary object detection
Deriving reliable region-word alignment from image-text pairs is critical to learnobject-level
vision-language representations for open-vocabulary object detection. Existing methods …
vision-language representations for open-vocabulary object detection. Existing methods …
Towards open vocabulary learning: A survey
In the field of visual scene understanding, deep neural networks have made impressive
advancements in various core tasks like segmentation, tracking, and detection. However …
advancements in various core tasks like segmentation, tracking, and detection. However …
Edadet: Open-vocabulary object detection using early dense alignment
Vision-language models such as CLIP have boosted the performance of open-vocabulary
object detection, where the detector is trained on base categories but required to detect …
object detection, where the detector is trained on base categories but required to detect …
A survey on open-vocabulary detection and segmentation: Past, present, and future
As the most fundamental scene understanding tasks, object detection and segmentation
have made tremendous progress in deep learning era. Due to the expensive manual …
have made tremendous progress in deep learning era. Due to the expensive manual …
General object foundation model for images and videos at scale
We present GLEE in this work an object-level foundation model for locating and identifying
objects in images and videos. Through a unified framework GLEEaccomplishes detection …
objects in images and videos. Through a unified framework GLEEaccomplishes detection …
Zero-shot referring image segmentation with global-local context features
Referring image segmentation (RIS) aims to find a segmentation mask given a referring
expression grounded to a region of the input image. Collecting labelled datasets for this …
expression grounded to a region of the input image. Collecting labelled datasets for this …
Going denser with open-vocabulary part segmentation
Object detection has been expanded from a limited number of categories to open
vocabulary. Moving forward, a complete intelligent vision system requires understanding …
vocabulary. Moving forward, a complete intelligent vision system requires understanding …