Študovňa Google

C Zhu, L Chen - IEEE Transactions on Pattern Analysis and …, 2024 - ieeexplore.ieee.org

As the most fundamental scene understanding tasks, object detection and segmentation
have made tremendous progress in deep learning era. Due to the expensive manual …

Uložiť Citovať Citované 24-krát Súvisiace články Všetky verzie 8

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Towards open vocabulary learning: A survey

J Wu, X Li, S Xu, H Yuan, H Ding… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

In the field of visual scene understanding, deep neural networks have made impressive
advancements in various core tasks like segmentation, tracking, and detection. However …

Uložiť Citovať Citované 128-krát Súvisiace články Všetky verzie 13

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Omg-seg: Is one model good enough for all segmentation?

X Li, H Yuan, W Li, H Ding, S Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com

In this work we address various segmentation tasks each traditionally tackled by distinct or
partially unified models. We propose OMG-Seg One Model that is Good enough to efficiently …

Uložiť Citovať Citované 49-krát Súvisiace články Všetky verzie 6 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Sclip: Rethinking self-attention for dense vision-language inference

F Wang, J Mei, A Yuille - European Conference on Computer Vision, 2024 - Springer

Recent advances in contrastive language-image pretraining (CLIP) have demonstrated
strong capabilities in zero-shot classification by aligning visual and textual features at an …

Uložiť Citovať Citované 49-krát Súvisiace články Všetky verzie 6

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Vitamin: Designing scalable vision models in the vision-language era

J Chen, Q Yu, X Shen, A Yuille… - Proceedings of the …, 2024 - openaccess.thecvf.com

Recent breakthroughs in vision-language models (VLMs) start a new page in the vision
community. The VLMs provide stronger and more generalizable feature embeddings …

Uložiť Citovať Citované 13-krát Súvisiace články Všetky verzie 9 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Pink: Unveiling the power of referential comprehension for multi-modal llms

S Xuan, Q Guo, M Yang… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Abstract Multi-modal Large Language Models (MLLMs) have shown remarkable capabilities
in various multi-modal tasks. Nevertheless their performance in fine-grained image …

Uložiť Citovať Citované 36-krát Súvisiace články Všetky verzie 7 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Proxyclip: Proxy attention improves clip for open-vocabulary segmentation

M Lan, C Chen, Y Ke, X Wang, L Feng… - European Conference on …, 2024 - Springer

Open-vocabulary semantic segmentation requires models to effectively integrate visual
representations with open-vocabulary semantic labels. While Contrastive Language-Image …

Uložiť Citovať Citované 16-krát Súvisiace články Všetky verzie 8

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Clearclip: Decomposing clip representations for dense vision-language inference

M Lan, C Chen, Y Ke, X Wang, L Feng… - European Conference on …, 2024 - Springer

Despite the success of large-scale pretrained Vision-Language Models (VLMs) especially
CLIP in various open-vocabulary tasks, their application to semantic segmentation remains …

Uložiť Citovať Citované 18-krát Súvisiace články Všetky verzie 8

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

DAC-DETR: Divide the attention layers and conquer

Z Hu, Y Sun, J Wang, Y Yang - Advances in Neural …, 2023 - proceedings.neurips.cc

This paper reveals a characteristic of DEtection Transformer (DETR) that negatively impacts
its training efficacy, ie, the cross-attention and self-attention layers in DETR decoder have …

Uložiť Citovať Citované 20-krát Súvisiace články Všetky verzie 4 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Exploring regional clues in CLIP for zero-shot semantic segmentation

Y Zhang, MH Guo, M Wang… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

CLIP has demonstrated marked progress in visual recognition due to its powerful pre-
training on large-scale image-text pairs. However it still remains a critical challenge: how to …

Uložiť Citovať Citované 7-krát Súvisiace články Všetky verzie 5 HTML verzia

Vytvoriť upozornenie

Citovať

Rozšírené vyhľadávanie

Uložené do mojej knižnice

Clipself: Vision transformer distills itself for open-vocabulary dense prediction

A survey on open-vocabulary detection and segmentation: Past, present, and future

Towards open vocabulary learning: A survey

Omg-seg: Is one model good enough for all segmentation?

Sclip: Rethinking self-attention for dense vision-language inference

Vitamin: Designing scalable vision models in the vision-language era

Pink: Unveiling the power of referential comprehension for multi-modal llms

Proxyclip: Proxy attention improves clip for open-vocabulary segmentation

Clearclip: Decomposing clip representations for dense vision-language inference

DAC-DETR: Divide the attention layers and conquer

Exploring regional clues in CLIP for zero-shot semantic segmentation