Multimodal research in vision and language: A review of current and emerging trends

S Uppal, S Bhagat, D Hazarika, N Majumder, S Poria… - Information …, 2022 - Elsevier
Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …

User simulation for evaluating information access systems

K Balog, CX Zhai - Proceedings of the Annual International ACM SIGIR …, 2023 - dl.acm.org
With the emergence of various information access systems exhibiting increasing complexity,
there is a critical need for sound and scalable means of automatic evaluation. To address …

Zero-shot composed image retrieval with textual inversion

A Baldrati, L Agnolucci, M Bertini… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Composed Image Retrieval (CIR) aims to retrieve a target image based on a query
composed of a reference image and a relative caption that describes the difference between …

Image retrieval on real-life images with pre-trained vision-and-language models

Z Liu, C Rodriguez-Opazo… - Proceedings of the …, 2021 - openaccess.thecvf.com
We extend the task of composed image retrieval, where an input query consists of an image
and short textual description of how to modify the image. Existing methods have only been …

Fashionvlp: Vision language transformer for fashion retrieval with feedback

S Goenka, Z Zheng, A Jaiswal… - Proceedings of the …, 2022 - openaccess.thecvf.com
Fashion image retrieval based on a query pair of reference image and natural language
feedback is a challenging task that requires models to assess fashion related information …

Composing text and image for image retrieval-an empirical odyssey

N Vo, L Jiang, C Sun, K Murphy, LJ Li… - Proceedings of the …, 2019 - openaccess.thecvf.com
In this paper, we study the task of image retrieval, where the input query is specified in the
form of an image plus some text that describes desired modifications to the input image. For …

The 7th ai city challenge

M Naphade, S Wang, DC Anastasiu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract The AI City Challenge's seventh edition emphasizes two domains at the intersection
of computer vision and artificial intelligence-retail business and Intelligent Traffic Systems …

Covr: Learning composed video retrieval from web video captions

L Ventura, A Yang, C Schmid, G Varol - Proceedings of the AAAI …, 2024 - ojs.aaai.org
Composed Image Retrieval (CoIR) has recently gained popularity as a task that considers
both text and image queries together, to search for relevant images in a database. Most …

Cosmo: Content-style modulation for image retrieval with text feedback

S Lee, D Kim, B Han - … of the IEEE/CVF Conference on …, 2021 - openaccess.thecvf.com
We tackle the task of image retrieval with text feedback, where a reference image and
modifier text are combined to identify the desired target image. We focus on designing an …

Image search with text feedback by visiolinguistic attention learning

Y Chen, S Gong, L Bazzani - Proceedings of the IEEE/CVF …, 2020 - openaccess.thecvf.com
Image search with text feedback has promising impacts in various real-world applications,
such as e-commerce and internet search. Given a reference image and text feedback from …