Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Transformers in vision: A survey
Astounding results from Transformer models on natural language tasks have intrigued the
vision community to study their application to computer vision problems. Among their salient …
vision community to study their application to computer vision problems. Among their salient …
MPCCT: Multimodal vision-language learning paradigm with context-based compact Transformer
C Chen, D Han, CC Chang - Pattern recognition, 2024 - Elsevier
Transformer and its variants have become the preferred option for multimodal vision-
language paradigms. However, they struggle with tasks that demand high-dependency …
language paradigms. However, they struggle with tasks that demand high-dependency …
A survey of visual transformers
Transformer, an attention-based encoder–decoder model, has already revolutionized the
field of natural language processing (NLP). Inspired by such significant achievements, some …
field of natural language processing (NLP). Inspired by such significant achievements, some …
Referring transformer: A one-step approach to multi-task visual grounding
As an important step towards visual reasoning, visual grounding (eg, phrase localization,
referring expression comprehension/segmentation) has been widely explored. Previous …
referring expression comprehension/segmentation) has been widely explored. Previous …
Local-global context aware transformer for language-guided video segmentation
We explore the task of language-guided video segmentation (LVS). Previous algorithms
mostly adopt 3D CNNs to learn video representation, struggling to capture long-term context …
mostly adopt 3D CNNs to learn video representation, struggling to capture long-term context …
Transvg++: End-to-end visual grounding with language conditioned vision transformer
In this work, we explore neat yet effective Transformer-based frameworks for visual
grounding. The previous methods generally address the core problem of visual grounding …
grounding. The previous methods generally address the core problem of visual grounding …
Context disentangling and prototype inheriting for robust visual grounding
Visual grounding (VG) aims to locate a specific target in an image based on a given
language query. The discriminative information from context is important for distinguishing …
language query. The discriminative information from context is important for distinguishing …
Transformer-based visual grounding with cross-modality interaction
This article tackles the challenging yet important task of Visual Grounding (VG), which aims
to localize a visual region in the given image referred by a natural language query. Existing …
to localize a visual region in the given image referred by a natural language query. Existing …