Computational technologies for fashion recommendation: A survey

Y Ding, Z Lai, PY Mok, TS Chua - ACM Computing Surveys, 2023 - dl.acm.org
Fashion recommendation is a key research field in computational fashion research and has
attracted considerable interest in the computer vision, multimedia, and information retrieval …

Your negative may not be true negative: Boosting image-text matching with false negative elimination

H Li, Y Bin, J Liao, Y Yang, HT Shen - Proceedings of the 31st ACM …, 2023 - dl.acm.org
Most existing image-text matching methods adopt triplet loss as the optimization objective,
and choosing a proper negative sample for the triplet of< anchor, positive, negative> is …

Fine-grained image-text alignment in medical imaging enables explainable cyclic image-report generation

W Chen, L Shen, J Lin, J Luo, X Li… - Proceedings of the 62nd …, 2024 - aclanthology.org
Fine-grained vision-language models (VLM) have been widely used for inter-modality local
alignment between the predefined fixed patches and textual words. However, in medical …

Cross-modal feature alignment and fusion for composed image retrieval

Y Wan, W Wang, G Zou… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Abstract Composed Image Retrieval (CIR) presents challenges in expressing search intent
through hybrid-modality queries where users search for a target image using another image …

Align and retrieve: Composition and decomposition learning in image retrieval with text feedback

Y Xu, Y Bin, J Wei, Y Yang, G Wang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
We study the task of image retrieval with text feedback, where a reference image and
modification text are composed to retrieve the desired target image. To accomplish this goal …

Unifying two-stream encoders with transformers for cross-modal retrieval

Y Bin, H Li, Y Xu, X Xu, Y Yang, HT Shen - Proceedings of the 31st ACM …, 2023 - dl.acm.org
Most existing cross-modal retrieval methods employ two-stream encoders with different
architectures for images and texts, eg, CNN for images and RNN/Transformer for texts. Such …

Simple but effective raw-data level multimodal fusion for composed image retrieval

H Wen, X Song, X Chen, Y Wei, L Nie… - Proceedings of the 47th …, 2024 - dl.acm.org
Composed image retrieval (CIR) aims to retrieve the target image based on a multimodal
query, ie, a reference image paired with corresponding modification text. Recent CIR studies …

Utilizing greedy nature for multimodal conditional image synthesis in transformers

S Su, J Zhu, L Gao, J Song - IEEE Transactions on Multimedia, 2023 - ieeexplore.ieee.org
Multimodal Conditional Image Synthesis (MCIS) aims to generate images according to
different modalities input and their combination, which allows users to describe their …

[HTML][HTML] Keyword-enhanced recommender system based on inductive graph matrix completion

D Han, D Kim, K Han, MY Yi - Engineering Applications of Artificial …, 2024 - Elsevier
Going beyond the user–item rating information, recent studies have utilized additional
information to improve the performance of recommender systems. Graph neural network …

SPGAN: siamese projection generative adversarial networks

Y Gan, T **ang, D Ouyang, M Zhou, M Ye - Knowledge-Based Systems, 2024 - Elsevier
Noise-to-image synthesis continues to be challenging, despite the application of the
advanced loss functions in Generative Adversarial Networks (GANs). The main issue lies in …