Študovňa Google

X Wang, G Chen, G Qian, P Gao, XY Wei… - Machine Intelligence …, 2023 - Springer

With the urgent demand for generalized deep models, many pre-trained big models are
proposed, such as bidirectional encoder representations (BERT), vision transformer (ViT) …

Uložiť Citovať Citované 199-krát Súvisiace články Všetky verzie 8

[Free GPT-4]
[DeepSeek]

[PDF] springer.com

Vlp: A survey on vision-language pre-training

FL Chen, DZ Zhang, ML Han, XY Chen, J Shi… - Machine Intelligence …, 2023 - Springer

In the past few years, the emergence of pre-training models has brought uni-modal fields
such as computer vision (CV) and natural language processing (NLP) to a new era …

Uložiť Citovať Citované 220-krát Súvisiace články Všetky verzie 8

[Free GPT-4]
[DeepSeek]

[PDF] researchgate.net

Dynamic modality interaction modeling for image-text retrieval

L Qu, M Liu, J Wu, Z Gao, L Nie - … of the 44th International ACM SIGIR …, 2021 - dl.acm.org

Image-text retrieval is a fundamental and crucial branch in information retrieval. Although
much progress has been made in bridging vision and language, it remains challenging …

Uložiť Citovať Citované 168-krát Súvisiace články Všetky verzie 3

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Kaleido-bert: Vision-language pre-training on fashion domain

M Zhuge, D Gao, DP Fan, L **… - Proceedings of the …, 2021 - openaccess.thecvf.com

We present a new vision-language (VL) pre-training model dubbed Kaleido-BERT, which
introduces a novel kaleido strategy for fashion cross-modality representations from …

Uložiť Citovať Citované 150-krát Súvisiace články Všetky verzie 6 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Image-text retrieval: A survey on recent research and development

M Cao, S Li, J Li, L Nie, M Zhang - arxiv preprint arxiv:2203.14713, 2022 - arxiv.org

In the past few years, cross-modal image-text retrieval (ITR) has experienced increased
interest in the research community due to its excellent research value and broad real-world …

Uložiť Citovať Citované 102-krát Súvisiace články Všetky verzie 4 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

M6: A chinese multimodal pretrainer

J Lin, R Men, A Yang, C Zhou, M Ding, Y Zhang… - arxiv preprint arxiv …, 2021 - arxiv.org

In this work, we construct the largest dataset for multimodal pretraining in Chinese, which
consists of over 1.9 TB images and 292GB texts that cover a wide range of domains. We …

Uložiť Citovať Citované 145-krát Súvisiace články Všetky verzie 2 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Ei-clip: Entity-aware interventional contrastive learning for e-commerce cross-modal retrieval

H Ma, H Zhao, Z Lin, A Kale, Z Wang… - Proceedings of the …, 2022 - openaccess.thecvf.com

Abstract recommendation, and marketing services. Extensive efforts have been made to
conquer the cross-modal retrieval problem in the general domain. When it comes to E …

Uložiť Citovať Citované 65-krát Súvisiace články Všetky verzie 3 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Fame-vil: Multi-tasking vision-language model for heterogeneous fashion tasks

X Han, X Zhu, L Yu, L Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

In the fashion domain, there exists a variety of vision-and-language (V+ L) tasks, including
cross-modal retrieval, text-guided image retrieval, multi-modal classification, and image …

Uložiť Citovať Citované 30-krát Súvisiace články Všetky verzie 9 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Cross-modal retrieval: a systematic review of methods and future directions

T Wang, F Li, L Zhu, J Li, Z Zhang… - Proceedings of the …, 2025 - ieeexplore.ieee.org

With the exponential surge in diverse multimodal data, traditional unimodal retrieval
methods struggle to meet the needs of users seeking access to data across various …

Uložiť Citovať Citované 25-krát Súvisiace články Všetky verzie 3

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Vision-and-language pretrained models: A survey

S Long, F Cao, SC Han, H Yang - arxiv preprint arxiv:2204.07356, 2022 - arxiv.org

Pretrained models have produced great success in both Computer Vision (CV) and Natural
Language Processing (NLP). This progress leads to learning joint representations of vision …

Uložiť Citovať Citované 64-krát Súvisiace články Všetky verzie 8 HTML verzia

Vytvoriť upozornenie

Citovať

Rozšírené vyhľadávanie

Uložené do mojej knižnice

Fashionbert: Text and image matching with adaptive loss for cross-modal retrieval

Large-scale multi-modal pre-trained models: A comprehensive survey

Vlp: A survey on vision-language pre-training

Dynamic modality interaction modeling for image-text retrieval

Kaleido-bert: Vision-language pre-training on fashion domain

Image-text retrieval: A survey on recent research and development

M6: A chinese multimodal pretrainer

Ei-clip: Entity-aware interventional contrastive learning for e-commerce cross-modal retrieval

Fame-vil: Multi-tasking vision-language model for heterogeneous fashion tasks

Cross-modal retrieval: a systematic review of methods and future directions

Vision-and-language pretrained models: A survey