Problems and opportunities in training deep learning software systems: An analysis of variance

HV Pham, S Qian, J Wang, T Lutellier… - Proceedings of the 35th …, 2020 - dl.acm.org
Deep learning (DL) training algorithms utilize nondeterminism to improve models' accuracy
and training efficiency. Hence, multiple identical training runs (eg, identical training data …

Multimodal research in vision and language: A review of current and emerging trends

S Uppal, S Bhagat, D Hazarika, N Majumder, S Poria… - Information …, 2022 - Elsevier
Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …

Pic2word: Map** pictures to words for zero-shot composed image retrieval

K Saito, K Sohn, X Zhang, CL Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract In Composed Image Retrieval (CIR), a user combines a query image with text to
describe their intended target. Existing methods rely on supervised learning of CIR models …

Zero-shot composed image retrieval with textual inversion

A Baldrati, L Agnolucci, M Bertini… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Composed Image Retrieval (CIR) aims to retrieve a target image based on a query
composed of a reference image and a relative caption that describes the difference between …

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

K Bayoudh, R Knani, F Hamdaoui, A Mtibaa - The Visual Computer, 2022 - Springer
The research progress in multimodal learning has grown rapidly over the last decade in
several areas, especially in computer vision. The growing potential of multimodal data …

Effective conditioned and composed image retrieval combining clip-based features

A Baldrati, M Bertini, T Uricchio… - Proceedings of the …, 2022 - openaccess.thecvf.com
Conditioned and composed image retrieval extend CBIR systems by combining a query
image with an additional text that expresses the intent of the user, describing additional …

Image retrieval on real-life images with pre-trained vision-and-language models

Z Liu, C Rodriguez-Opazo… - Proceedings of the …, 2021 - openaccess.thecvf.com
We extend the task of composed image retrieval, where an input query consists of an image
and short textual description of how to modify the image. Existing methods have only been …

Language-only training of zero-shot composed image retrieval

G Gu, S Chun, W Kim, Y Kang… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Composed image retrieval (CIR) task takes a composed query of image and text aiming to
search relative images for both conditions. Conventional CIR approaches need a training …

Knowledge-enhanced dual-stream zero-shot composed image retrieval

Y Suo, F Ma, L Zhu, Y Yang - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com
We study the zero-shot Composed Image Retrieval (ZS-CIR) task which is to retrieve the
target image given a reference image and a description without training on the triplet …

Fashionvlp: Vision language transformer for fashion retrieval with feedback

S Goenka, Z Zheng, A Jaiswal… - Proceedings of the …, 2022 - openaccess.thecvf.com
Fashion image retrieval based on a query pair of reference image and natural language
feedback is a challenging task that requires models to assess fashion related information …