Cross-Modal Retrieval: A Review of Methodologies, Datasets, and Future Perspectives
With the rapid development of science and technology, all types of mixed media contain
large amounts of data. Traditional single multimedia data can no longer satisfy daily …
large amounts of data. Traditional single multimedia data can no longer satisfy daily …
Fashionsap: Symbols and attributes prompt for fine-grained fashion vision-language pre-training
Fashion vision-language pre-training models have shown efficacy for a wide range of
downstream tasks. However, general vision-language pre-training models pay less attention …
downstream tasks. However, general vision-language pre-training models pay less attention …
Question-conditioned debiasing with focal visual context fusion for visual question answering
J Liu, GX Wang, CF Fan, F Zhou, HJ Xu - Knowledge-Based Systems, 2023 - Elsevier
Abstract Existing Visual Question Answering models suffer from the language prior, where
the answers provided by the models overly rely on the correlations between questions and …
the answers provided by the models overly rely on the correlations between questions and …
All in one: Exploring unified vision-language tracking with multi-modal alignment
Current mainstream vision-language (VL) tracking framework consists of three parts, ie, a
visual feature extractor, a language feature extractor, and a fusion model. To pursue better …
visual feature extractor, a language feature extractor, and a fusion model. To pursue better …
Deep supervised dual cycle adversarial network for cross-modal retrieval
Cross-modal retrieval tasks, which are more natural and challenging than traditional
retrieval tasks, have attracted increasing interest from researchers in recent years. Although …
retrieval tasks, have attracted increasing interest from researchers in recent years. Although …
Contrastive label correlation enhanced unified hashing encoder for cross-modal retrieval
Cross-modal hashing (CMH) has been widely used in multimedia retrieval applications for
its low storage cost and fast indexing speed. Thanks to the success of deep learning, cross …
its low storage cost and fast indexing speed. Thanks to the success of deep learning, cross …
[HTML][HTML] Partial visual-semantic embedding: Fine-grained outfit image representation with massive volumes of tags via angular-based contrastive learning
A novel technology named fashion intelligence system, which quantifies ambiguous
expressions unique to fashion, such as “casual,”“adult-casual,” and “office-casual,” was …
expressions unique to fashion, such as “casual,”“adult-casual,” and “office-casual,” was …
MiC: Image-text Matching in Circles with cross-modal generative knowledge enhancement
Image-text matching is a challenging task due to vast discrepancies between the visual and
textual modalities. Existing solutions tend to focus on a limited set of strongly aligned or …
textual modalities. Existing solutions tend to focus on a limited set of strongly aligned or …
Multimodal Distillation Pre-training Model for Ultrasound Dynamic Images Annotation
X Chen, J Ke, Y Zhang, J Gou, A Shen… - IEEE Journal of …, 2024 - ieeexplore.ieee.org
With the development of medical technology, ultrasonography has become an important
diagnostic method in doctors' clinical work. However, compared with the static medical …
diagnostic method in doctors' clinical work. However, compared with the static medical …
Collaborative group: Composed image retrieval via consensus learning from noisy annotations
X Zhang, Z Zheng, L Zhu, Y Yang - Knowledge-Based Systems, 2024 - Elsevier
Composed image retrieval extends content-based image retrieval systems by enabling
users to search using reference images and captions that describe their intention. Despite …
users to search using reference images and captions that describe their intention. Despite …