- Academic Search

P Sangkloy, W Jitkrittum, D Yang, J Hays - European conference on …, 2022 - Springer

We address the problem of retrieving in-the-wild images with both a sketch and a text query.
We present TASK-former (Text And SKetch transformer), an end-to-end trainable model for …

Enregistrer Citer Cité 35 fois Autres articles Les 9 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A Survey of Multimodal Composite Editing and Retrieval

S Li, F Huang, L Zhang - arxiv preprint arxiv:2409.05405, 2024 - arxiv.org

In the real world, where information is abundant and diverse across different modalities,
understanding and utilizing various data types to improve retrieval systems is a key focus of …

Enregistrer Citer Cité 2 fois Autres articles Les 2 versions Free GPT-4 DeepSeek Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] cardiff.ac.uk

[PDF][PDF] SceneDiff: Generative scene-level image retrieval with text and sketch using diffusion models

R Zuo, H Hu, X Deng, C Gao, Z Zhang, Y Lai, C Ma… - 2024 - orca.cardiff.ac.uk

Jointly using text and sketch for scene-level image retrieval utilizes the complementary
between text and sketch to describe the fine-grained scene content and retrieve the target …

Enregistrer Citer Autres articles Les 4 versions Free GPT-4 DeepSeek Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] academia.edu

[PDF][PDF] Multimodal visual and simulated muscle activations for grounded semantics of hand-related descriptions

D Moro, C Kennington - Proceedings of the 22nd Workshop onthe …, 2018 - academia.edu

In this paper, we build on research which has applied visually-derived features for grounded
semantics by leveraging an additional modality: simulated hand muscle activations. We …

Enregistrer Citer Cité 4 fois Autres articles Les 2 versions Free GPT-4 DeepSeek Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] uni-potsdam.de

[PDF][PDF] Learning to describe multimodally from parallel unimodal data? A pilot study on verbal and sketched object descriptions

T Han, S Zarrieß, K Komatani… - Proceedings of the …, 2018 - clp.ling.uni-potsdam.de

Previous work on multimodality in interaction has mostly focussed on integrating models for
verbal utterances and embodied modalities like gestures. In this paper, we take a first step …

Enregistrer Citer Cité 1 fois Autres articles Les 5 versions Free GPT-4 DeepSeek Version HTML

Semantic Enhanced Sketch Based Image Retrieval with Incomplete Multimodal Query

SD Bhattacharjee, J Yuan - 2020 IEEE Sixth International …, 2020 - ieeexplore.ieee.org

Sketch Based Image Retrieval (SBIR) is a challenging problem mainly due to a significant
cross-domain gap between hand-drawn sketches and natural images. While extra semantic …

Enregistrer Citer Autres articles Les 2 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Enabling Robots to Draw and Tell: Towards Visually Grounded Multimodal Description Generation

T Han, S Zarrieß - arxiv preprint arxiv:2101.12338, 2021 - arxiv.org

Socially competent robots should be equipped with the ability to perceive the world that
surrounds them and communicate about it in a human-like manner. Representative skills …

Enregistrer Citer Autres articles Les 4 versions Free GPT-4 DeepSeek Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

Sketch Me if You Can: Towards Generating Detailed Descriptions of Object Shape by Grounding in Images and Drawings

T Han, S Zarrieß - … of the 12th International Conference on Natural …, 2019 - aclanthology.org

A lot of recent work in Language & Vision has looked at generating descriptions or referring
expressions for objects in scenes of real-world images, though focusing mostly on relatively …

Enregistrer Citer Autres articles Les 5 versions Free GPT-4 DeepSeek Version HTML

Créer l'alerte

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

Draw and tell: Multimodal descriptions outperform verbal-or sketch-only descriptions in an...

A sketch is worth a thousand words: Image retrieval with text and sketch

A Survey of Multimodal Composite Editing and Retrieval

[PDF][PDF] SceneDiff: Generative scene-level image retrieval with text and sketch using diffusion models

[PDF][PDF] Multimodal visual and simulated muscle activations for grounded semantics of hand-related descriptions

[PDF][PDF] Learning to describe multimodally from parallel unimodal data? A pilot study on verbal and sketched object descriptions

Semantic Enhanced Sketch Based Image Retrieval with Incomplete Multimodal Query

Enabling Robots to Draw and Tell: Towards Visually Grounded Multimodal Description Generation

Sketch Me if You Can: Towards Generating Detailed Descriptions of Object Shape by Grounding in Images and Drawings