Google Tudós

Shifting more attention to visual backbone: Query-modulated refinement networks for end-to-end...

Turnitin 降AI改写早检测系统早降重系统 Turnitin-UK版万方检测-期刊版维普编辑部版 Grammarly检测 Paperpass检测 checkpass检测 PaperYY检测

MPCCT: Multimodal vision-language learning paradigm with context-based compact Transformer

C Chen, D Han, CC Chang - Pattern recognition, 2024 - Elsevier

Transformer and its variants have become the preferred option for multimodal vision-
language paradigms. However, they struggle with tasks that demand high-dependency …

Mentés Hivatkozás Idézetek száma: 101 Kapcsolódó cikkek Mind a(z) 3 változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Rsvg: Exploring data and models for visual grounding on remote sensing data

Y Zhan, Z **ong, Y Yuan - IEEE Transactions on Geoscience …, 2023 - ieeexplore.ieee.org

In this article, we introduce the task of visual grounding for remote sensing data (RSVG).
RSVG aims to localize the referred objects in remote sensing (RS) images with the guidance …

Mentés Hivatkozás Idézetek száma: 108 Kapcsolódó cikkek Mind a(z) 5 változat

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Language adaptive weight generation for multi-task visual grounding

W Su, P Miao, H Dou, G Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Although the impressive performance in visual grounding, the prevailing approaches usually
exploit the visual backbone in a passive way, ie, the visual backbone extracts features with …

Mentés Hivatkozás Idézetek száma: 44 Kapcsolódó cikkek Mind a(z) 9 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

X-mesh: Towards fast and accurate text-driven 3d stylization via dynamic textual guidance

Y Ma, X Zhang, X Sun, J Ji, H Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Text-driven 3D stylization is a complex and crucial task in the fields of computer vision (CV)
and computer graphics (CG), aimed at transforming a bare mesh to fit a target text. Prior …

Mentés Hivatkozás Idézetek száma: 36 Kapcsolódó cikkek Mind a(z) 5 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Transvg++: End-to-end visual grounding with language conditioned vision transformer

J Deng, Z Yang, D Liu, T Chen, W Zhou… - IEEE transactions on …, 2023 - ieeexplore.ieee.org

In this work, we explore neat yet effective Transformer-based frameworks for visual
grounding. The previous methods generally address the core problem of visual grounding …

Mentés Hivatkozás Idézetek száma: 59 Kapcsolódó cikkek Mind a(z) 7 változat

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

Grounded multimodal named entity recognition on social media

J Yu, Z Li, J Wang, R **a - … of the 61st Annual Meeting of the …, 2023 - aclanthology.org

Abstract In recent years, Multimodal Named Entity Recognition (MNER) on social media has
attracted considerable attention. However, existing MNER studies only extract entity-type …

Mentés Hivatkozás Idézetek száma: 28 Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

Lgr-net: Language guided reasoning network for referring expression comprehension

M Lu, R Li, F Feng, Z Ma, X Wang - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Referring Expression Comprehension (REC) is a fundamental task in the vision and
language domain, which aims to locate an image region according to a natural language …

Mentés Hivatkozás Idézetek száma: 17 Kapcsolódó cikkek Mind a(z) 3 változat

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Scanformer: Referring expression comprehension by iteratively scanning

W Su, P Miao, H Dou, X Li - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

Abstract Referring Expression Comprehension (REC) aims to localize the target objects
specified by free-form natural language descriptions in images. While state-of-the-art …

Mentés Hivatkozás Idézetek száma: 7 Kapcsolódó cikkek Mind a(z) 6 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Unifying visual and vision-language tracking via contrastive learning

Y Ma, Y Tang, W Yang, T Zhang, J Zhang… - Proceedings of the AAAI …, 2024 - ojs.aaai.org

Single object tracking aims to locate the target object in a video sequence according to the
state specified by different modal references, including the initial bounding box (BBOX) …

Mentés Hivatkozás Idézetek száma: 16 Kapcsolódó cikkek Mind a(z) 6 változat HTML-változat

Language-guided progressive attention for visual grounding in remote sensing images

K Li, D Wang, H Xu, H Zhong… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Visual grounding in remote sensing (RSVG) images aims to detect specific objects
associated with referring expressions in remote sensing images. Existing methods typically …

Mentés Hivatkozás Idézetek száma: 11 Kapcsolódó cikkek Mind a(z) 3 változat

Értesítés létrehozása

Hivatkozás

Speciális keresés

Mentve a Saját könyvtárba

Shifting more attention to visual backbone: Query-modulated refinement networks for end-to-end...

MPCCT: Multimodal vision-language learning paradigm with context-based compact Transformer

Rsvg: Exploring data and models for visual grounding on remote sensing data

Language adaptive weight generation for multi-task visual grounding

X-mesh: Towards fast and accurate text-driven 3d stylization via dynamic textual guidance

Transvg++: End-to-end visual grounding with language conditioned vision transformer

Grounded multimodal named entity recognition on social media

Lgr-net: Language guided reasoning network for referring expression comprehension

Scanformer: Referring expression comprehension by iteratively scanning

Unifying visual and vision-language tracking via contrastive learning

Language-guided progressive attention for visual grounding in remote sensing images