Multimodal research in vision and language: A review of current and emerging trends
Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …
with a diverse range of modalities present in the real-world data. More recently, this has …
User simulation for evaluating information access systems
With the emergence of various information access systems exhibiting increasing complexity,
there is a critical need for sound and scalable means of automatic evaluation. To address …
there is a critical need for sound and scalable means of automatic evaluation. To address …
Zero-shot composed image retrieval with textual inversion
Abstract Composed Image Retrieval (CIR) aims to retrieve a target image based on a query
composed of a reference image and a relative caption that describes the difference between …
composed of a reference image and a relative caption that describes the difference between …
Image retrieval on real-life images with pre-trained vision-and-language models
We extend the task of composed image retrieval, where an input query consists of an image
and short textual description of how to modify the image. Existing methods have only been …
and short textual description of how to modify the image. Existing methods have only been …
Fashionvlp: Vision language transformer for fashion retrieval with feedback
Fashion image retrieval based on a query pair of reference image and natural language
feedback is a challenging task that requires models to assess fashion related information …
feedback is a challenging task that requires models to assess fashion related information …
Composing text and image for image retrieval-an empirical odyssey
In this paper, we study the task of image retrieval, where the input query is specified in the
form of an image plus some text that describes desired modifications to the input image. For …
form of an image plus some text that describes desired modifications to the input image. For …
The 7th ai city challenge
Abstract The AI City Challenge's seventh edition emphasizes two domains at the intersection
of computer vision and artificial intelligence-retail business and Intelligent Traffic Systems …
of computer vision and artificial intelligence-retail business and Intelligent Traffic Systems …
Covr: Learning composed video retrieval from web video captions
Composed Image Retrieval (CoIR) has recently gained popularity as a task that considers
both text and image queries together, to search for relevant images in a database. Most …
both text and image queries together, to search for relevant images in a database. Most …
Cosmo: Content-style modulation for image retrieval with text feedback
We tackle the task of image retrieval with text feedback, where a reference image and
modifier text are combined to identify the desired target image. We focus on designing an …
modifier text are combined to identify the desired target image. We focus on designing an …
Image search with text feedback by visiolinguistic attention learning
Image search with text feedback has promising impacts in various real-world applications,
such as e-commerce and internet search. Given a reference image and text feedback from …
such as e-commerce and internet search. Given a reference image and text feedback from …