Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
MPCCT: Multimodal vision-language learning paradigm with context-based compact Transformer
C Chen, D Han, CC Chang - Pattern recognition, 2024 - Elsevier
Transformer and its variants have become the preferred option for multimodal vision-
language paradigms. However, they struggle with tasks that demand high-dependency …
language paradigms. However, they struggle with tasks that demand high-dependency …
Rsvg: Exploring data and models for visual grounding on remote sensing data
In this article, we introduce the task of visual grounding for remote sensing data (RSVG).
RSVG aims to localize the referred objects in remote sensing (RS) images with the guidance …
RSVG aims to localize the referred objects in remote sensing (RS) images with the guidance …
Language adaptive weight generation for multi-task visual grounding
Although the impressive performance in visual grounding, the prevailing approaches usually
exploit the visual backbone in a passive way, ie, the visual backbone extracts features with …
exploit the visual backbone in a passive way, ie, the visual backbone extracts features with …
X-mesh: Towards fast and accurate text-driven 3d stylization via dynamic textual guidance
Text-driven 3D stylization is a complex and crucial task in the fields of computer vision (CV)
and computer graphics (CG), aimed at transforming a bare mesh to fit a target text. Prior …
and computer graphics (CG), aimed at transforming a bare mesh to fit a target text. Prior …
Transvg++: End-to-end visual grounding with language conditioned vision transformer
In this work, we explore neat yet effective Transformer-based frameworks for visual
grounding. The previous methods generally address the core problem of visual grounding …
grounding. The previous methods generally address the core problem of visual grounding …
Grounded multimodal named entity recognition on social media
Abstract In recent years, Multimodal Named Entity Recognition (MNER) on social media has
attracted considerable attention. However, existing MNER studies only extract entity-type …
attracted considerable attention. However, existing MNER studies only extract entity-type …
Lgr-net: Language guided reasoning network for referring expression comprehension
Referring Expression Comprehension (REC) is a fundamental task in the vision and
language domain, which aims to locate an image region according to a natural language …
language domain, which aims to locate an image region according to a natural language …
Scanformer: Referring expression comprehension by iteratively scanning
Abstract Referring Expression Comprehension (REC) aims to localize the target objects
specified by free-form natural language descriptions in images. While state-of-the-art …
specified by free-form natural language descriptions in images. While state-of-the-art …
Unifying visual and vision-language tracking via contrastive learning
Single object tracking aims to locate the target object in a video sequence according to the
state specified by different modal references, including the initial bounding box (BBOX) …
state specified by different modal references, including the initial bounding box (BBOX) …
Language-guided progressive attention for visual grounding in remote sensing images
Visual grounding in remote sensing (RSVG) images aims to detect specific objects
associated with referring expressions in remote sensing images. Existing methods typically …
associated with referring expressions in remote sensing images. Existing methods typically …