- Academic Search

C Zhu, L Chen - IEEE Transactions on Pattern Analysis and …, 2024 - ieeexplore.ieee.org

As the most fundamental scene understanding tasks, object detection and segmentation
have made tremendous progress in deep learning era. Due to the expensive manual …

Zapisz Cytuj Cytowane przez 24 Powiązane artykuły Wszystkie wersje 8

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Vision-language models for vision tasks: A survey

J Zhang, J Huang, S **, S Lu - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org

Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks
(DNNs) training, and they usually train a DNN for each single visual recognition task …

Zapisz Cytuj Cytowane przez 466 Powiązane artykuły Wszystkie wersje 11

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Yolo-world: Real-time open-vocabulary object detection

T Cheng, L Song, Y Ge, W Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract The You Only Look Once (YOLO) series of detectors have established themselves
as efficient and practical tools. However their reliance on predefined and trained object …

Zapisz Cytuj Cytowane przez 235 Powiązane artykuły Wszystkie wersje 7 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Towards open vocabulary learning: A survey

J Wu, X Li, S Xu, H Yuan, H Ding… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

In the field of visual scene understanding, deep neural networks have made impressive
advancements in various core tasks like segmentation, tracking, and detection. However …

Zapisz Cytuj Cytowane przez 127 Powiązane artykuły Wszystkie wersje 13

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Aligning bag of regions for open-vocabulary object detection

S Wu, W Zhang, S **, W Liu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Pre-trained vision-language models (VLMs) learn to align vision and language
representations on large-scale datasets, where each image-text pair usually contains a bag …

Zapisz Cytuj Cytowane przez 114 Powiązane artykuły Wszystkie wersje 7 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Codet: Co-occurrence guided region-word alignment for open-vocabulary object detection

C Ma, Y Jiang, X Wen, Z Yuan… - Advances in neural …, 2023 - proceedings.neurips.cc

Deriving reliable region-word alignment from image-text pairs is critical to learnobject-level
vision-language representations for open-vocabulary object detection. Existing methods …

Zapisz Cytuj Cytowane przez 49 Powiązane artykuły Wszystkie wersje 7 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

General object foundation model for images and videos at scale

J Wu, Y Jiang, Q Liu, Z Yuan… - Proceedings of the …, 2024 - openaccess.thecvf.com

We present GLEE in this work an object-level foundation model for locating and identifying
objects in images and videos. Through a unified framework GLEEaccomplishes detection …

Zapisz Cytuj Cytowane przez 42 Powiązane artykuły Wszystkie wersje 7 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Zero-shot referring image segmentation with global-local context features

S Yu, PH Seo, J Son - … of the IEEE/CVF conference on …, 2023 - openaccess.thecvf.com

Referring image segmentation (RIS) aims to find a segmentation mask given a referring
expression grounded to a region of the input image. Collecting labelled datasets for this …

Zapisz Cytuj Cytowane przez 57 Powiązane artykuły Wszystkie wersje 7 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Going denser with open-vocabulary part segmentation

P Sun, S Chen, C Zhu, F **ao, P Luo… - Proceedings of the …, 2023 - openaccess.thecvf.com

Object detection has been expanded from a limited number of categories to open
vocabulary. Moving forward, a complete intelligent vision system requires understanding …

Zapisz Cytuj Cytowane przez 43 Powiązane artykuły Wszystkie wersje 8 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Vitamin: Designing scalable vision models in the vision-language era

J Chen, Q Yu, X Shen, A Yuille… - Proceedings of the …, 2024 - openaccess.thecvf.com

Recent breakthroughs in vision-language models (VLMs) start a new page in the vision
community. The VLMs provide stronger and more generalizable feature embeddings …

Zapisz Cytuj Cytowane przez 13 Powiązane artykuły Wszystkie wersje 9 Wersja HTML

Utwórz alert

Cytuj

Szukanie zaawansowane

Zapisano w Mojej bibliotece

Learning object-language alignments for open-vocabulary object detection

A survey on open-vocabulary detection and segmentation: Past, present, and future

Vision-language models for vision tasks: A survey

Yolo-world: Real-time open-vocabulary object detection

Towards open vocabulary learning: A survey

Aligning bag of regions for open-vocabulary object detection

Codet: Co-occurrence guided region-word alignment for open-vocabulary object detection

General object foundation model for images and videos at scale

Zero-shot referring image segmentation with global-local context features

Going denser with open-vocabulary part segmentation

Vitamin: Designing scalable vision models in the vision-language era