Computational technologies for fashion recommendation: A survey
Fashion recommendation is a key research field in computational fashion research and has
attracted considerable interest in the computer vision, multimedia, and information retrieval …
attracted considerable interest in the computer vision, multimedia, and information retrieval …
Your negative may not be true negative: Boosting image-text matching with false negative elimination
Most existing image-text matching methods adopt triplet loss as the optimization objective,
and choosing a proper negative sample for the triplet of< anchor, positive, negative> is …
and choosing a proper negative sample for the triplet of< anchor, positive, negative> is …
Fine-grained image-text alignment in medical imaging enables explainable cyclic image-report generation
Fine-grained vision-language models (VLM) have been widely used for inter-modality local
alignment between the predefined fixed patches and textual words. However, in medical …
alignment between the predefined fixed patches and textual words. However, in medical …
Cross-modal feature alignment and fusion for composed image retrieval
Y Wan, W Wang, G Zou… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Abstract Composed Image Retrieval (CIR) presents challenges in expressing search intent
through hybrid-modality queries where users search for a target image using another image …
through hybrid-modality queries where users search for a target image using another image …
Align and retrieve: Composition and decomposition learning in image retrieval with text feedback
We study the task of image retrieval with text feedback, where a reference image and
modification text are composed to retrieve the desired target image. To accomplish this goal …
modification text are composed to retrieve the desired target image. To accomplish this goal …
Unifying two-stream encoders with transformers for cross-modal retrieval
Most existing cross-modal retrieval methods employ two-stream encoders with different
architectures for images and texts, eg, CNN for images and RNN/Transformer for texts. Such …
architectures for images and texts, eg, CNN for images and RNN/Transformer for texts. Such …
Simple but effective raw-data level multimodal fusion for composed image retrieval
Composed image retrieval (CIR) aims to retrieve the target image based on a multimodal
query, ie, a reference image paired with corresponding modification text. Recent CIR studies …
query, ie, a reference image paired with corresponding modification text. Recent CIR studies …
Utilizing greedy nature for multimodal conditional image synthesis in transformers
Multimodal Conditional Image Synthesis (MCIS) aims to generate images according to
different modalities input and their combination, which allows users to describe their …
different modalities input and their combination, which allows users to describe their …
[HTML][HTML] Keyword-enhanced recommender system based on inductive graph matrix completion
Going beyond the user–item rating information, recent studies have utilized additional
information to improve the performance of recommender systems. Graph neural network …
information to improve the performance of recommender systems. Graph neural network …
SPGAN: siamese projection generative adversarial networks
Noise-to-image synthesis continues to be challenging, despite the application of the
advanced loss functions in Generative Adversarial Networks (GANs). The main issue lies in …
advanced loss functions in Generative Adversarial Networks (GANs). The main issue lies in …