- Academic Search

Z Han, A Azman, MR Mustaffa, FB Khalid - IEEE Access, 2024 - ieeexplore.ieee.org

With the rapid development of science and technology, all types of mixed media contain
large amounts of data. Traditional single multimedia data can no longer satisfy daily …

Enregistrer Citer Cité 3 fois Autres articles Les 3 versions Free GPT-4

[Free GPT-4]

[PDF] thecvf.com

Fashionsap: Symbols and attributes prompt for fine-grained fashion vision-language pre-training

Y Han, L Zhang, Q Chen, Z Chen, Z Li… - Proceedings of the …, 2023 - openaccess.thecvf.com

Fashion vision-language pre-training models have shown efficacy for a wide range of
downstream tasks. However, general vision-language pre-training models pay less attention …

Enregistrer Citer Cité 15 fois Autres articles Les 5 versions Free GPT-4 Version HTML

Question-conditioned debiasing with focal visual context fusion for visual question answering

J Liu, GX Wang, CF Fan, F Zhou, HJ Xu - Knowledge-Based Systems, 2023 - Elsevier

Abstract Existing Visual Question Answering models suffer from the language prior, where
the answers provided by the models overly rely on the correlations between questions and …

Enregistrer Citer Cité 8 fois Autres articles Les 3 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

All in one: Exploring unified vision-language tracking with multi-modal alignment

C Zhang, X Sun, Y Yang, L Liu, Q Liu, X Zhou… - Proceedings of the 31st …, 2023 - dl.acm.org

Current mainstream vision-language (VL) tracking framework consists of three parts, ie, a
visual feature extractor, a language feature extractor, and a fusion model. To pursue better …

Enregistrer Citer Cité 16 fois Autres articles Les 3 versions Free GPT-4

Deep supervised dual cycle adversarial network for cross-modal retrieval

L Liao, M Yang, B Zhang - … on Circuits and Systems for Video …, 2022 - ieeexplore.ieee.org

Cross-modal retrieval tasks, which are more natural and challenging than traditional
retrieval tasks, have attracted increasing interest from researchers in recent years. Although …

Enregistrer Citer Cité 14 fois Autres articles Les 2 versions Free GPT-4

[Free GPT-4]

[PDF] researchgate.net

Contrastive label correlation enhanced unified hashing encoder for cross-modal retrieval

H Wu, L Zhang, Q Chen, Y Deng, J Siebert… - Proceedings of the 31st …, 2022 - dl.acm.org

Cross-modal hashing (CMH) has been widely used in multimedia retrieval applications for
its low storage cost and fast indexing speed. Thanks to the success of deep learning, cross …

Enregistrer Citer Cité 13 fois Autres articles Les 4 versions Free GPT-4

[Free GPT-4]

[HTML] sciencedirect.com

[HTML][HTML] Partial visual-semantic embedding: Fine-grained outfit image representation with massive volumes of tags via angular-based contrastive learning

R Shimizu, T Nakamura, M Goto - Knowledge-Based Systems, 2023 - Elsevier

A novel technology named fashion intelligence system, which quantifies ambiguous
expressions unique to fashion, such as “casual,”“adult-casual,” and “office-casual,” was …

Enregistrer Citer Cité 5 fois Autres articles Les 4 versions Free GPT-4

[Free GPT-4]

[PDF] ssrn.com

MiC: Image-text Matching in Circles with cross-modal generative knowledge enhancement

X Pu, Y Chen, L Yuan, Y Zhang, H Li, L **g… - Knowledge-Based …, 2024 - Elsevier

Image-text matching is a challenging task due to vast discrepancies between the visual and
textual modalities. Existing solutions tend to focus on a limited set of strongly aligned or …

Enregistrer Citer Cité 2 fois Autres articles Les 2 versions Free GPT-4

Multimodal Distillation Pre-training Model for Ultrasound Dynamic Images Annotation

X Chen, J Ke, Y Zhang, J Gou, A Shen… - IEEE Journal of …, 2024 - ieeexplore.ieee.org

With the development of medical technology, ultrasonography has become an important
diagnostic method in doctors' clinical work. However, compared with the static medical …

Enregistrer Citer Cité 1 fois Autres articles Les 2 versions Free GPT-4

[Free GPT-4]

[PDF] ssrn.com

Collaborative group: Composed image retrieval via consensus learning from noisy annotations

X Zhang, Z Zheng, L Zhu, Y Yang - Knowledge-Based Systems, 2024 - Elsevier

Composed image retrieval extends content-based image retrieval systems by enabling
users to search using reference images and captions that describe their intention. Despite …

Enregistrer Citer Cité 1 fois Autres articles Les 2 versions Free GPT-4

Créer l'alerte

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

VLDeformer: Vision–Language Decomposed Transformer for fast cross-modal retrieval

Cross-Modal Retrieval: A Review of Methodologies, Datasets, and Future Perspectives

Fashionsap: Symbols and attributes prompt for fine-grained fashion vision-language pre-training

Question-conditioned debiasing with focal visual context fusion for visual question answering

All in one: Exploring unified vision-language tracking with multi-modal alignment

Deep supervised dual cycle adversarial network for cross-modal retrieval

Contrastive label correlation enhanced unified hashing encoder for cross-modal retrieval

[HTML][HTML] Partial visual-semantic embedding: Fine-grained outfit image representation with massive volumes of tags via angular-based contrastive learning

MiC: Image-text Matching in Circles with cross-modal generative knowledge enhancement

Multimodal Distillation Pre-training Model for Ultrasound Dynamic Images Annotation

Collaborative group: Composed image retrieval via consensus learning from noisy annotations