Pic2word: Map** pictures to words for zero-shot composed image retrieval

K Saito, K Sohn, X Zhang, CL Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract In Composed Image Retrieval (CIR), a user combines a query image with text to
describe their intended target. Existing methods rely on supervised learning of CIR models …

Genecis: A benchmark for general conditional image similarity

S Vaze, N Carion, I Misra - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
We argue that there are many notions of'similarity'and that models, like humans, should be
able to adapt to these dynamically. This contrasts with most representation learning …

Probvlm: Probabilistic adapter for frozen vison-language models

U Upadhyay, S Karthik, M Mancini… - Proceedings of the …, 2023 - openaccess.thecvf.com
Large-scale vision-language models (VLMs) like CLIP successfully find correspondences
between images and text. Through the standard deterministic map** process, an image or …

Improved probabilistic image-text representations

S Chun - arxiv preprint arxiv:2305.18171, 2023 - arxiv.org
Image-Text Matching (ITM) task, a fundamental vision-language (VL) task, suffers from the
inherent ambiguity arising from multiplicity and imperfect annotations. Deterministic …

Probabilistic contrastive learning recovers the correct aleatoric uncertainty of ambiguous inputs

M Kirchhof, E Kasneci, SJ Oh - International Conference on …, 2023 - proceedings.mlr.press
Contrastively trained encoders have recently been proven to invert the data-generating
process: they encode each input, eg, an image, into the true latent vector that generated the …

Attribute-guided pedestrian retrieval: Bridging person re-id with internal attribute variability

Y Huang, Z Zhang, Q Wu, Y Zhong… - Proceedings of the …, 2024 - openaccess.thecvf.com
In various domains such as surveillance and smart retail pedestrian retrieval centering on
person re-identification (Re-ID) plays a pivotal role. Existing Re-ID methodologies often …

Hierarchical matching and reasoning for multi-query image retrieval

Z Ji, Z Li, Y Zhang, H Wang, Y Pang, X Li - Neural Networks, 2024 - Elsevier
As a promising field, Multi-Query Image Retrieval (MQIR) aims at searching for the
semantically relevant image given multiple region-specific text queries. Existing works …

Diffcad: Weakly-supervised probabilistic cad model retrieval and alignment from an rgb image

D Gao, D Rozenberszki, S Leutenegger… - ACM Transactions on …, 2024 - dl.acm.org
Perceiving 3D structures from RGB images based on CAD model primitives can enable an
effective, efficient 3D object-based representation of scenes. However, current approaches …

Robust multimodal learning via representation decoupling

S Wei, Y Luo, Y Wang, C Luo - European Conference on Computer Vision, 2024 - Springer
Multimodal learning robust to missing modality has attracted increasing attention due to its
practicality. Existing methods tend to address it by learning a common subspace …

Real20M: A large-scale e-commerce dataset for cross-domain retrieval

Y Chen, H Zhong, X He, Y Peng, L Cheng - Proceedings of the 31st ACM …, 2023 - dl.acm.org
In e-commerce, products and micro-videos serve as two primary carriers. Introducing cross-
domain retrieval between these carriers can establish associations, thereby leading to the …