Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Vision-language pre-training: Basics, recent advances, and future trends
This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …
intelligence that have been developed in the last few years. We group these approaches …
A review of recurrent neural networks: LSTM cells and network architectures
Y Yu, X Si, C Hu, J Zhang - Neural computation, 2019 - direct.mit.edu
Recurrent neural networks (RNNs) have been widely adopted in research areas concerned
with sequential data, such as text, audio, and video. However, RNNs consisting of sigma …
with sequential data, such as text, audio, and video. However, RNNs consisting of sigma …
Deep hierarchical semantic segmentation
Humans are able to recognize structured relations in observation, allowing us to decompose
complex scenes into simpler parts and abstract the visual world in multiple levels. However …
complex scenes into simpler parts and abstract the visual world in multiple levels. However …
Visual semantic reasoning for image-text matching
Image-text matching has been a hot research topic bridging the vision and language areas.
It remains challenging because the current representation of image usually lacks global …
It remains challenging because the current representation of image usually lacks global …
Imram: Iterative matching with recurrent attention memory for cross-modal image-text retrieval
Enabling bi-directional retrieval of images and texts is important for understanding the
correspondence between vision and language. Existing methods leverage the attention …
correspondence between vision and language. Existing methods leverage the attention …
Hierarchical deep click feature prediction for fine-grained image recognition
The click feature of an image, defined as the user click frequency vector of the image on a
predefined word vocabulary, is known to effectively reduce the semantic gap for fine-grained …
predefined word vocabulary, is known to effectively reduce the semantic gap for fine-grained …
Stacked cross attention for image-text matching
In this paper, we study the problem of image-text matching. Inferring the latent semantic
alignment between objects or other salient stuff (eg snow, sky, lawn) and the corresponding …
alignment between objects or other salient stuff (eg snow, sky, lawn) and the corresponding …
Context-aware attention network for image-text retrieval
As a typical cross-modal problem, image-text bi-directional retrieval relies heavily on the
joint embedding learning and similarity measure for each image-text pair. It remains …
joint embedding learning and similarity measure for each image-text pair. It remains …
Semantically self-aligned network for text-to-image part-aware person re-identification
Text-to-image person re-identification (ReID) aims to search for images containing a person
of interest using textual descriptions. However, due to the significant modality gap and the …
of interest using textual descriptions. However, due to the significant modality gap and the …
Camp: Cross-modal adaptive message passing for text-image retrieval
Text-image cross-modal retrieval is a challenging task in the field of language and vision.
Most previous approaches independently embed images and sentences into a joint …
Most previous approaches independently embed images and sentences into a joint …