- Academic Search

OF Kar, A Tonioni, P Poklukar, A Kulshrestha… - … on Computer Vision, 2024 - Springer

Vision-language models (VLMs) are typically composed of a vision encoder, eg CLIP, and a
language model (LM) that interprets the encoded features to solve downstream tasks …

Uložit Citovat Počet citací tohoto článku: 29 Související články Všechny verze (počet: 2)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey on open-vocabulary detection and segmentation: Past, present, and future

C Zhu, L Chen - IEEE Transactions on Pattern Analysis and …, 2024 - ieeexplore.ieee.org

As the most fundamental scene understanding tasks, object detection and segmentation
have made tremendous progress in deep learning era. Due to the expensive manual …

Uložit Citovat Počet citací tohoto článku: 24 Související články Všechny verze (počet: 7)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Proxyclip: Proxy attention improves clip for open-vocabulary segmentation

M Lan, C Chen, Y Ke, X Wang, L Feng… - European Conference on …, 2024 - Springer

Open-vocabulary semantic segmentation requires models to effectively integrate visual
representations with open-vocabulary semantic labels. While Contrastive Language-Image …

Uložit Citovat Počet citací tohoto článku: 16 Související články Všechny verze (počet: 8)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Improving medical multi-modal contrastive learning with expert annotations

Y Kumar, P Marttinen - European Conference on Computer Vision, 2024 - Springer

We introduce eCLIP, an enhanced version of the CLIP model that integrates expert
annotations in the form of radiologist eye-gaze heatmaps. It tackles key challenges in …

Uložit Citovat Počet citací tohoto článku: 14 Související články Všechny verze (počet: 2)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

SemiVL: semi-supervised semantic segmentation with vision-language guidance

L Hoyer, DJ Tan, MF Naeem, L Van Gool… - European Conference on …, 2024 - Springer

In semi-supervised semantic segmentation, a model is trained with a limited number of
labeled images along with a large corpus of unlabeled images to reduce the high annotation …

Uložit Citovat Počet citací tohoto článku: 17 Související články Všechny verze (počet: 3)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Contrastive localized language-image pre-training

HY Chen, Z Lai, H Zhang, X Wang, M Eichner… - arxiv preprint arxiv …, 2024 - arxiv.org

Contrastive Language-Image Pre-training (CLIP) has been a celebrated method for training
vision encoders to generate image/text representations facilitating various applications …

Uložit Citovat Počet citací tohoto článku: 4 Související články Všechny verze (počet: 3) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Image segmentation in foundation model era: A survey

T Zhou, F Zhang, B Chang, W Wang, Y Yuan… - arxiv preprint arxiv …, 2024 - arxiv.org

Image segmentation is a long-standing challenge in computer vision, studied continuously
over several decades, as evidenced by seminal algorithms such as N-Cut, FCN, and …

Uložit Citovat Počet citací tohoto článku: 5 Související články Všechny verze (počet: 3) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Unimed-clip: Towards a unified image-text pretraining paradigm for diverse medical imaging modalities

MU Khattak, S Kunhimon, M Naseer, S Khan… - arxiv preprint arxiv …, 2024 - arxiv.org

Vision-Language Models (VLMs) trained via contrastive learning have achieved notable
success in natural image tasks. However, their application in the medical domain remains …

Uložit Citovat Počet citací tohoto článku: 2 Související články Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Active data curation effectively distills large-scale multimodal models

V Udandarao, N Parthasarathy, MF Naeem… - arxiv preprint arxiv …, 2024 - arxiv.org

Knowledge distillation (KD) is the de facto standard for compressing large-scale models into
smaller ones. Prior works have explored ever more complex KD strategies involving different …

Uložit Citovat Počet citací tohoto článku: 2 Související články Všechny verze (počet: 2) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Human Pose Descriptions and Subject-Focused Attention for Improved Zero-Shot Transfer in Human-Centric Classification Tasks

MSU Khan, MF Naeem, F Tombari, L Van Gool… - arxiv preprint arxiv …, 2024 - arxiv.org

We present a novel LLM-based pipeline for creating contextual descriptions of human body
poses in images using only auxiliary attributes. This approach facilitates the creation of the …

Uložit Citovat Počet citací tohoto článku: 3 Související články Zobrazit jako HTML

Vytvořit upozornění

Citovat

Rozšířené vyhledávání

Uloženo do Mojí knihovny

Silc: Improving vision language pretraining with self-distillation

BRAVE: Broadening the visual encoding of vision-language models

A survey on open-vocabulary detection and segmentation: Past, present, and future

Proxyclip: Proxy attention improves clip for open-vocabulary segmentation

Improving medical multi-modal contrastive learning with expert annotations

SemiVL: semi-supervised semantic segmentation with vision-language guidance

Contrastive localized language-image pre-training

Image segmentation in foundation model era: A survey

Unimed-clip: Towards a unified image-text pretraining paradigm for diverse medical imaging modalities

Active data curation effectively distills large-scale multimodal models

Human Pose Descriptions and Subject-Focused Attention for Improved Zero-Shot Transfer in Human-Centric Classification Tasks