„Google“ mokslinčius

M Cao, S Li, J Li, L Nie, M Zhang - arxiv preprint arxiv:2203.14713, 2022 - arxiv.org

In the past few years, cross-modal image-text retrieval (ITR) has experienced increased
interest in the research community due to its excellent research value and broad real-world …

Išsaugoti Cituoti Cituoja 102 Susiję straipsniai Visos 4 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Cross-modal retrieval: a systematic review of methods and future directions

T Wang, F Li, L Zhu, J Li, Z Zhang… - Proceedings of the …, 2025 - ieeexplore.ieee.org

With the exponential surge in diverse multimodal data, traditional unimodal retrieval
methods struggle to meet the needs of users seeking access to data across various …

Išsaugoti Cituoti Cituoja 25 Susiję straipsniai Visos 3 versijos

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Fashionvlp: Vision language transformer for fashion retrieval with feedback

S Goenka, Z Zheng, A Jaiswal… - Proceedings of the …, 2022 - openaccess.thecvf.com

Fashion image retrieval based on a query pair of reference image and natural language
feedback is a challenging task that requires models to assess fashion related information …

Išsaugoti Cituoti Cituoja 102 Susiję straipsniai Visos 5 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Vista: Vision and scene text aggregation for cross-modal retrieval

M Cheng, Y Sun, L Wang, X Zhu… - Proceedings of the …, 2022 - openaccess.thecvf.com

Visual appearance is considered to be the most important cue to understand images for
cross-modal retrieval, while sometimes the scene text appearing in images can provide …

Išsaugoti Cituoti Cituoja 80 Susiję straipsniai Visos 7 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A large cross-modal video retrieval dataset with reading comprehension

W Wu, Y Zhao, Z Li, J Li, H Zhou, MZ Shou, X Bai - Pattern Recognition, 2025 - Elsevier

Most existing cross-modal language-to-video retrieval (VR) research focuses on single-
modal input from video, ie, visual representation, while the text is omnipresent in human …

Išsaugoti Cituoti Cituoja 20 Susiję straipsniai Visos 8 versijos

Mmpedia: A large-scale multi-modal knowledge graph

Y Wu, X Wu, J Li, Y Zhang, H Wang, W Du, Z He… - International semantic …, 2023 - Springer

Abstract Knowledge graphs serve as crucial resources for various applications. However,
most existing knowledge graphs present symbolic knowledge in the form of natural …

Išsaugoti Cituoti Cituoja 18 Susiję straipsniai Visos 4 versijos

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Ocr-idl: Ocr annotations for industry document library dataset

AF Biten, R Tito, L Gomez, E Valveny… - European Conference on …, 2022 - Springer

Pretraining has proven successful in Document Intelligence tasks where deluge of
documents are used to pretrain the models only later to be finetuned on downstream tasks …

Išsaugoti Cituoti Cituoja 33 Susiję straipsniai Visos 9 versijos

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Is an image worth five sentences? A new look into semantics for image-text matching

AF Biten, A Mafla, L Gómez… - Proceedings of the …, 2022 - openaccess.thecvf.com

The task of image-text matching aims to map representations from different modalities into a
common joint visual-textual embedding. However, the most widely used datasets for this …

Išsaugoti Cituoti Cituoja 25 Susiję straipsniai Visos 10 versijos HTML kopija

Bcra: bidirectional cross-modal implicit relation reasoning and aligning for text-to-image person retrieval

Z Li, Y **e - Multimedia Systems, 2024 - Springer

Text-to-image person retrieval aims to retrieve relevant target individuals based on given
textual descriptions. The main challenge faced by this task is how to better combine and …

Išsaugoti Cituoti Cituoja 3 Susiję straipsniai Visos 2 versijos

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Adaptive transformer-based conditioned variational autoencoder for incomplete social event classification

Z Li, S Qian, J Cao, Q Fang, C Xu - Proceedings of the 30th ACM …, 2022 - dl.acm.org

With the rapid development of the Internet and the expanding scale of social media,
incomplete social event classification has increasingly become a challenging task. The key …

Išsaugoti Cituoti Cituoja 9 Susiję straipsniai

Kurti įspėjimą

Cituoti

Išplėstinė paieška

Išsaugota skiltyje „Mano biblioteka“

Stacmr: Scene-text aware cross-modal retrieval

Image-text retrieval: A survey on recent research and development

Cross-modal retrieval: a systematic review of methods and future directions

Fashionvlp: Vision language transformer for fashion retrieval with feedback

Vista: Vision and scene text aggregation for cross-modal retrieval

A large cross-modal video retrieval dataset with reading comprehension

Mmpedia: A large-scale multi-modal knowledge graph

Ocr-idl: Ocr annotations for industry document library dataset

Is an image worth five sentences? A new look into semantics for image-text matching

Bcra: bidirectional cross-modal implicit relation reasoning and aligning for text-to-image person retrieval

Adaptive transformer-based conditioned variational autoencoder for incomplete social event classification