Knowledge-enhanced dual-stream zero-shot composed image retrieval

Y Suo, F Ma, L Zhu, Y Yang - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com
We study the zero-shot Composed Image Retrieval (ZS-CIR) task which is to retrieve the
target image given a reference image and a description without training on the triplet …

Towards flexible perception with visual memory

R Geirhos, P Jaini, A Stone, S Medapati, X Yi… - arxiv preprint arxiv …, 2024 - arxiv.org
Training a neural network is a monolithic endeavor, akin to carving knowledge into stone:
once the process is completed, editing the knowledge in a network is nearly impossible …

Anchor-based Robust Finetuning of Vision-Language Models

J Han, Z Lin, Z Sun, Y Gao, K Yan… - Proceedings of the …, 2024 - openaccess.thecvf.com
We aim at finetuning a vision-language model without hurting its out-of-distribution (OOD)
generalization. We address two types of OOD generalization ie i) domain shift such as …

Context-aware multimodal pretraining

K Roth, Z Akata, D Damen, I Balažević… - arxiv preprint arxiv …, 2024 - arxiv.org
Large-scale multimodal representation learning successfully optimizes for zero-shot transfer
at test time. Yet the standard pretraining paradigm (contrastive learning on large amounts of …