A sketch is worth a thousand words: Image retrieval with text and sketch

P Sangkloy, W Jitkrittum, D Yang, J Hays - European conference on …, 2022 - Springer
We address the problem of retrieving in-the-wild images with both a sketch and a text query.
We present TASK-former (Text And SKetch transformer), an end-to-end trainable model for …

A Survey of Multimodal Composite Editing and Retrieval

S Li, F Huang, L Zhang - arxiv preprint arxiv:2409.05405, 2024 - arxiv.org
In the real world, where information is abundant and diverse across different modalities,
understanding and utilizing various data types to improve retrieval systems is a key focus of …

[PDF][PDF] SceneDiff: Generative scene-level image retrieval with text and sketch using diffusion models

R Zuo, H Hu, X Deng, C Gao, Z Zhang, Y Lai, C Ma… - 2024 - orca.cardiff.ac.uk
Jointly using text and sketch for scene-level image retrieval utilizes the complementary
between text and sketch to describe the fine-grained scene content and retrieve the target …

[PDF][PDF] Multimodal visual and simulated muscle activations for grounded semantics of hand-related descriptions

D Moro, C Kennington - Proceedings of the 22nd Workshop onthe …, 2018 - academia.edu
In this paper, we build on research which has applied visually-derived features for grounded
semantics by leveraging an additional modality: simulated hand muscle activations. We …

[PDF][PDF] Learning to describe multimodally from parallel unimodal data? A pilot study on verbal and sketched object descriptions

T Han, S Zarrieß, K Komatani… - Proceedings of the …, 2018 - clp.ling.uni-potsdam.de
Previous work on multimodality in interaction has mostly focussed on integrating models for
verbal utterances and embodied modalities like gestures. In this paper, we take a first step …

Semantic Enhanced Sketch Based Image Retrieval with Incomplete Multimodal Query

SD Bhattacharjee, J Yuan - 2020 IEEE Sixth International …, 2020 - ieeexplore.ieee.org
Sketch Based Image Retrieval (SBIR) is a challenging problem mainly due to a significant
cross-domain gap between hand-drawn sketches and natural images. While extra semantic …

Enabling Robots to Draw and Tell: Towards Visually Grounded Multimodal Description Generation

T Han, S Zarrieß - arxiv preprint arxiv:2101.12338, 2021 - arxiv.org
Socially competent robots should be equipped with the ability to perceive the world that
surrounds them and communicate about it in a human-like manner. Representative skills …

Sketch Me if You Can: Towards Generating Detailed Descriptions of Object Shape by Grounding in Images and Drawings

T Han, S Zarrieß - … of the 12th International Conference on Natural …, 2019 - aclanthology.org
A lot of recent work in Language & Vision has looked at generating descriptions or referring
expressions for objects in scenes of real-world images, though focusing mostly on relatively …