Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Teaching structured vision & language concepts to vision & language models
Vision and Language (VL) models have demonstrated remarkable zero-shot performance in
a variety of tasks. However, some aspects of complex language understanding still remain a …
a variety of tasks. However, some aspects of complex language understanding still remain a …
Lightningdot: Pre-training visual-semantic embeddings for real-time image-text retrieval
Multimodal pre-training has propelled great advancement in vision-and-language research.
These large-scale pre-trained models, although successful, fatefully suffer from slow …
These large-scale pre-trained models, although successful, fatefully suffer from slow …
Vision-based real-time process monitoring and problem feedback for productivity-oriented analysis in off-site construction
X Chen, Y Wang, J Wang, A Bouferguene… - Automation in …, 2024 - Elsevier
The widespread adoption of surveillance cameras in work environments has enabled the
direct and non-intrusive detection of productivity-related issues in the field of construction. In …
direct and non-intrusive detection of productivity-related issues in the field of construction. In …
M3p: Learning universal representations via multitask multilingual multimodal pre-training
M Ni, H Huang, L Su, E Cui, T Bharti… - Proceedings of the …, 2021 - openaccess.thecvf.com
We present M3P, a Multitask Multilingual Multimodal Pre-trained model that combines
multilingual pre-training and multimodal pre-training into a unified framework via multitask …
multilingual pre-training and multimodal pre-training into a unified framework via multitask …
Retrieve fast, rerank smart: Cooperative and joint approaches for improved cross-modal retrieval
Current state-of-the-art approaches to cross-modal retrieval process text and visual input
jointly, relying on Transformer-based architectures with cross-attention mechanisms that …
jointly, relying on Transformer-based architectures with cross-attention mechanisms that …
Multilingual multimodal pre-training for zero-shot cross-lingual transfer of vision-language models
This paper studies zero-shot cross-lingual transfer of vision-language models. Specifically,
we focus on multilingual text-to-video search and propose a Transformer-based model that …
we focus on multilingual text-to-video search and propose a Transformer-based model that …
Mural: multimodal, multitask retrieval across languages
Both image-caption pairs and translation pairs provide the means to learn deep
representations of and connections between languages. We use both types of pairs in …
representations of and connections between languages. We use both types of pairs in …
Text to image generation: leaving no language behind
P Reviriego, E Merino-Gómez - arxiv preprint arxiv:2208.09333, 2022 - arxiv.org
One of the latest applications of Artificial Intelligence (AI) is to generate images from natural
language descriptions. These generators are now becoming available and achieve …
language descriptions. These generators are now becoming available and achieve …
Cross-lingual cross-modal retrieval with noise-robust learning
Despite the recent developments in the field of cross-modal retrieval, there has been less
research focusing on low-resource languages due to the lack of manually annotated …
research focusing on low-resource languages due to the lack of manually annotated …
Assessing multilingual fairness in pre-trained multimodal representations
Recently pre-trained multimodal models, such as CLIP, have shown exceptional capabilities
towards connecting images and natural language. The textual representations in English …
towards connecting images and natural language. The textual representations in English …