- Academic Search

W Bousselham, F Petersen… - Proceedings of the …, 2024 - openaccess.thecvf.com

Vision-language foundation models have shown remarkable performance in various zero-
shot settings such as image retrieval classification or captioning. But so far those models …

Save Cite Cited by 30 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] thecvf.com

Clip as rnn: Segment countless visual concepts without training endeavor

S Sun, R Li, P Torr, X Gu, S Li - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Existing open-vocabulary image segmentation methods require a fine-tuning step on mask
labels and/or image-text datasets. Mask labels are labor-intensive which limits the number of …

Save Cite Cited by 22 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Diffusion feedback helps clip see better

W Wang, Q Sun, F Zhang, Y Tang, J Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

Contrastive Language-Image Pre-training (CLIP), which excels at abstracting open-world
representations across domains and modalities, has become a foundation for a variety of …

Save Cite Cited by 9 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] thecvf.com

Decoupling static and hierarchical motion perception for referring video segmentation

S He, H Ding - Proceedings of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

Referring video segmentation relies on natural language expressions to identify and
segment objects often emphasizing motion clues. Previous works treat a sentence as a …

Save Cite Cited by 20 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] thecvf.com

Zero-shot referring expression comprehension via structural similarity between images and captions

Z Han, F Zhu, Q Lao, H Jiang - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Zero-shot referring expression comprehension aims at localizing bounding boxes in an
image corresponding to provided textual prompts which requires:(i) a fine-grained …

Save Cite Cited by 14 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] springer.com

Primitivenet: decomposing the global constraints for referring segmentation

C Liu, X Jiang, H Ding - Visual Intelligence, 2024 - Springer

In referring segmentation, modeling the complicated constraints in the multimodal
information is one of the most challenging problems. As the information in a given language …

Save Cite Cited by 10 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Resmatch: Referring expression segmentation in a semi-supervised manner

Y Zang, R Cao, C Fu, D Zhu, M Zhang, W Hu, L Zhu… - Information …, 2025 - Elsevier

Referring Expression segmentation (RES), a task that involves localizing specific instance-
level objects on the basis of free-form linguistic descriptions, has emerged as a crucial …

Save Cite Cited by 7 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] neurips.cc

Text promptable surgical instrument segmentation with vision-language models

Z Zhou, O Alabi, M Wei… - Advances in Neural …, 2023 - proceedings.neurips.cc

In this paper, we propose a novel text promptable surgical instrument segmentation
approach to overcome challenges associated with diversity and differentiation of surgical …

Save Cite Cited by 24 Related articles All 9 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Ref-diff: Zero-shot referring image segmentation with generative models

M Ni, Y Zhang, K Feng, X Li, Y Guo, W Zuo - arxiv preprint arxiv …, 2023 - arxiv.org

Zero-shot referring image segmentation is a challenging task because it aims to find an
instance segmentation mask based on the given referring descriptions, without training on …

Save Cite Cited by 19 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] thecvf.com

Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation

W Wang, T Yue, Y Zhang, L Guo… - Proceedings of the …, 2024 - openaccess.thecvf.com

Referring expression segmentation (RES) aims at segmenting the foreground masks of the
entities that match the descriptive natural language expression. Previous datasets and …

Save Cite Cited by 7 Related articles All 3 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Zero-shot referring image segmentation with global-local context features

Grounding everything: Emerging localization properties in vision-language transformers

Clip as rnn: Segment countless visual concepts without training endeavor

Diffusion feedback helps clip see better

Decoupling static and hierarchical motion perception for referring video segmentation

Zero-shot referring expression comprehension via structural similarity between images and captions

Primitivenet: decomposing the global constraints for referring segmentation

Resmatch: Referring expression segmentation in a semi-supervised manner

Text promptable surgical instrument segmentation with vision-language models

Ref-diff: Zero-shot referring image segmentation with generative models

Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation