Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Vision-language pre-training: Basics, recent advances, and future trends
This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …
intelligence that have been developed in the last few years. We group these approaches …
Vision-language models in remote sensing: Current progress and future trends
The remarkable achievements of ChatGPT and Generative Pre-trained Transformer 4 (GPT-
4) have sparked a wave of interest and research in the field of large language models …
4) have sparked a wave of interest and research in the field of large language models …
Seqtrack: Sequence to sequence learning for visual object tracking
In this paper, we present a new sequence-to-sequence learning framework for visual
tracking, dubbed SeqTrack. It casts visual tracking as a sequence generation problem …
tracking, dubbed SeqTrack. It casts visual tracking as a sequence generation problem …
Universal instance perception as object discovery and retrieval
All instance perception tasks aim at finding certain objects specified by some queries such
as category names, language expressions, and target annotations, but this complete field …
as category names, language expressions, and target annotations, but this complete field …
Gres: Generalized referring expression segmentation
Abstract Referring Expression Segmentation (RES) aims to generate a segmentation mask
for the object described by a given language expression. Existing classic RES datasets and …
for the object described by a given language expression. Existing classic RES datasets and …
MPCCT: Multimodal vision-language learning paradigm with context-based compact Transformer
C Chen, D Han, CC Chang - Pattern recognition, 2024 - Elsevier
Transformer and its variants have become the preferred option for multimodal vision-
language paradigms. However, they struggle with tasks that demand high-dependency …
language paradigms. However, they struggle with tasks that demand high-dependency …
Mdetr-modulated detection for end-to-end multi-modal understanding
Multi-modal reasoning systems rely on a pre-trained object detector to extract regions of
interest from the image. However, this crucial module is typically used as a black box …
interest from the image. However, this crucial module is typically used as a black box …
Earthgpt: A universal multi-modal large language model for multi-sensor image comprehension in remote sensing domain
Multimodal large language models (MLLMs) have demonstrated remarkable success in
vision and visual-language tasks within the natural image domain. Owing to the significant …
vision and visual-language tasks within the natural image domain. Owing to the significant …
Multi3drefer: Grounding text description to multiple 3d objects
We introduce the task of localizing a flexible number of objects in real-world 3D scenes
using natural language descriptions. Existing 3D visual grounding tasks focus on localizing …
using natural language descriptions. Existing 3D visual grounding tasks focus on localizing …
VLT: Vision-language transformer and query generation for referring segmentation
We propose a Vision-Language Transformer (VLT) framework for referring segmentation to
facilitate deep interactions among multi-modal information and enhance the holistic …
facilitate deep interactions among multi-modal information and enhance the holistic …