- Academic Search

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022 - nowpublishers.com

This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …

Save Cite Cited by 197 Related articles All 7 versions Free GPT-4 Library Search View as HTML

[Free GPT-4]

[PDF] ieee.org

AI assisted fashion design: A review

Z Guo, Z Zhu, Y Li, S Cao, H Chen, G Wang - IEEE Access, 2023 - ieeexplore.ieee.org

This review explores the integration of enhanced personalization and seamless multimodal
interfaces in the field of fashion design and recommendation. We examine the increasing …

Save Cite Cited by 34 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] thecvf.com

Pic2word: Map** pictures to words for zero-shot composed image retrieval

K Saito, K Sohn, X Zhang, CL Li… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract In Composed Image Retrieval (CIR), a user combines a query image with text to
describe their intended target. Existing methods rely on supervised learning of CIR models …

Save Cite Cited by 106 Related articles All 9 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

VLT: Vision-language transformer and query generation for referring segmentation

H Ding, C Liu, S Wang, X Jiang - IEEE Transactions on Pattern …, 2022 - ieeexplore.ieee.org

We propose a Vision-Language Transformer (VLT) framework for referring segmentation to
facilitate deep interactions among multi-modal information and enhance the holistic …

Save Cite Cited by 129 Related articles All 7 versions Free GPT-4

[Free GPT-4]

[PDF] aaai.org

Covr: Learning composed video retrieval from web video captions

L Ventura, A Yang, C Schmid, G Varol - Proceedings of the AAAI …, 2024 - ojs.aaai.org

Composed Image Retrieval (CoIR) has recently gained popularity as a task that considers
both text and image queries together, to search for relevant images in a database. Most …

Save Cite Cited by 32 Related articles All 14 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] thecvf.com

Fame-vil: Multi-tasking vision-language model for heterogeneous fashion tasks

X Han, X Zhu, L Yu, L Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

In the fashion domain, there exists a variety of vision-and-language (V+ L) tasks, including
cross-modal retrieval, text-guided image retrieval, multi-modal classification, and image …

Save Cite Cited by 29 Related articles All 8 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] thecvf.com

You'll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval

S Koley, AK Bhunia, A Sain… - Proceedings of the …, 2024 - openaccess.thecvf.com

Two primary input modalities prevail in image retrieval: sketch and text. While text is widely
used for inter-category retrieval tasks sketches have been established as the sole preferred …

Save Cite Cited by 10 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] thecvf.com

Multilateral semantic relations modeling for image text retrieval

Z Wang, Z Gao, K Guo, Y Yang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Image-text retrieval is a fundamental task to bridge vision and language by exploiting
various strategies to fine-grained alignment between regions and words. This is still tough …

Save Cite Cited by 25 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Chatting makes perfect: Chat-based image retrieval

M Levy, R Ben-Ari, N Darshan… - Advances in Neural …, 2024 - proceedings.neurips.cc

Chats emerge as an effective user-friendly approach for information retrieval, and are
successfully employed in many domains, such as customer service, healthcare, and finance …

Save Cite Cited by 22 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Target-guided composed image retrieval

H Wen, X Zhang, X Song, Y Wei, L Nie - Proceedings of the 31st ACM …, 2023 - dl.acm.org

Composed image retrieval (CIR) is a new and flexible image retrieval paradigm, which can
retrieve the target image for a multimodal query, including a reference image and its …

Save Cite Cited by 26 Related articles All 3 versions Free GPT-4

Create alert

Cite

Advanced search

Saved to My library

Fashionvlp: Vision language transformer for fashion retrieval with feedback

Vision-language pre-training: Basics, recent advances, and future trends

AI assisted fashion design: A review

Pic2word: Map** pictures to words for zero-shot composed image retrieval

VLT: Vision-language transformer and query generation for referring segmentation

Covr: Learning composed video retrieval from web video captions

Fame-vil: Multi-tasking vision-language model for heterogeneous fashion tasks

You'll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval

Multilateral semantic relations modeling for image text retrieval

Chatting makes perfect: Chat-based image retrieval

Target-guided composed image retrieval