Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
CLVIN: Complete language-vision interaction network for visual question answering
C Chen, D Han, X Shen - Knowledge-Based Systems, 2023 - Elsevier
The emergence of the Transformer optimizes the interactive modeling of multimodal
information in visual question answering (VQA) tasks, hel** machines better understand …
information in visual question answering (VQA) tasks, hel** machines better understand …
SGUIE-Net: Semantic attention guided underwater image enhancement with multi-scale perception
Due to the wavelength-dependent light attenuation, refraction and scattering, underwater
images usually suffer from color distortion and blurred details. However, due to the limited …
images usually suffer from color distortion and blurred details. However, due to the limited …
Region-object relation-aware dense captioning via transformer
Dense captioning provides detailed captions of complex visual scenes. While a number of
successes have been achieved in recent years, there are still two broad limitations: 1) most …
successes have been achieved in recent years, there are still two broad limitations: 1) most …
Deep fuzzy hashing network for efficient image retrieval
Hashing methods for efficient image retrieval aim at learning hash functions that map similar
images to semantically correlated binary codes in the Hamming space with similarity well …
images to semantically correlated binary codes in the Hamming space with similarity well …
Fashionvlp: Vision language transformer for fashion retrieval with feedback
Fashion image retrieval based on a query pair of reference image and natural language
feedback is a challenging task that requires models to assess fashion related information …
feedback is a challenging task that requires models to assess fashion related information …
An overview of recent work in media forensics: Methods and threats
In this paper, we review recent work in media forensics for digital images, video, audio
(specifically speech), and documents. For each data modality, we discuss synthesis and …
(specifically speech), and documents. For each data modality, we discuss synthesis and …
Towards multi-modal sarcasm detection via hierarchical congruity modeling with knowledge enhancement
Sarcasm is a linguistic phenomenon indicating a discrepancy between literal meanings and
implied intentions. Due to its sophisticated nature, it is usually challenging to be detected …
implied intentions. Due to its sophisticated nature, it is usually challenging to be detected …
Nearest neighbor-based contrastive learning for hyperspectral and LiDAR data classification
The joint hyperspectral image (HSI) and light detection and ranging (LiDAR) data
classification aims to interpret ground objects at more detailed and precise level. Although …
classification aims to interpret ground objects at more detailed and precise level. Although …
Sarcasm driven by sentiment: A sentiment-aware hierarchical fusion network for multimodal sarcasm detection
Sarcasm is a form of sentiment expression that highlights the disparity between a person's
true intentions and the content they explicitly present. With the exponential increase in …
true intentions and the content they explicitly present. With the exponential increase in …
Cosmo: Content-style modulation for image retrieval with text feedback
We tackle the task of image retrieval with text feedback, where a reference image and
modifier text are combined to identify the desired target image. We focus on designing an …
modifier text are combined to identify the desired target image. We focus on designing an …