Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Vision-language pre-training: Basics, recent advances, and future trends
This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …
intelligence that have been developed in the last few years. We group these approaches …
From show to tell: A survey on deep learning-based image captioning
Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …
reason, large research efforts have been devoted to image captioning, ie describing images …
Tm2t: Stochastic and tokenized modeling for the reciprocal generation of 3d human motions and texts
Inspired by the strong ties between vision and language, the two intimate human sensing
and communication modalities, our paper aims to explore the generation of 3D human full …
and communication modalities, our paper aims to explore the generation of 3D human full …
Multi-modal knowledge graph construction and application: A survey
Recent years have witnessed the resurgence of knowledge engineering which is featured
by the fast growth of knowledge graphs. However, most of existing knowledge graphs are …
by the fast growth of knowledge graphs. However, most of existing knowledge graphs are …
RSTNet: Captioning with adaptive attention on visual and non-visual words
Recent progress on visual question answering has explored the merits of grid features for
vision language tasks. Meanwhile, transformer-based models have shown remarkable …
vision language tasks. Meanwhile, transformer-based models have shown remarkable …
Attention on attention for image captioning
Attention mechanisms are widely used in current encoder/decoder frameworks of image
captioning, where a weighted average on encoded vectors is generated at each time step to …
captioning, where a weighted average on encoded vectors is generated at each time step to …
Visualgpt: Data-efficient adaptation of pretrained language models for image captioning
The limited availability of annotated data often hinders real-world applications of machine
learning. To efficiently learn from small quantities of multimodal data, we leverage the …
learning. To efficiently learn from small quantities of multimodal data, we leverage the …
Videobert: A joint model for video and language representation learning
Self-supervised learning has become increasingly important to leverage the abundance of
unlabeled data available on platforms like YouTube. Whereas most existing approaches …
unlabeled data available on platforms like YouTube. Whereas most existing approaches …
Learning conditional attributes for compositional zero-shot learning
Abstract Compositional Zero-Shot Learning (CZSL) aims to train models to recognize novel
compositional concepts based on learned concepts such as attribute-object combinations …
compositional concepts based on learned concepts such as attribute-object combinations …
Remote sensing image change captioning with dual-branch transformers: A new method and a large scale dataset
Analyzing land cover changes with multitemporal remote sensing (RS) images is crucial for
environmental protection and land planning. In this article, we explore RS image change …
environmental protection and land planning. In this article, we explore RS image change …