Cross-modal retrieval: a systematic review of methods and future directions
With the exponential surge in diverse multimodal data, traditional unimodal retrieval
methods struggle to meet the needs of users seeking access to data across various …
methods struggle to meet the needs of users seeking access to data across various …
Towards robust pattern recognition: A review
The accuracies for many pattern recognition tasks have increased rapidly year by year,
achieving or even outperforming human performance. From the perspective of accuracy …
achieving or even outperforming human performance. From the perspective of accuracy …
Graph neural networks: foundation, frontiers and applications
The field of graph neural networks (GNNs) has seen rapid and incredible strides over the
recent years. Graph neural networks, also known as deep learning on graphs, graph …
recent years. Graph neural networks, also known as deep learning on graphs, graph …
Negative-aware attention framework for image-text matching
Image-text matching, as a fundamental task, bridges the gap between vision and language.
The key of this task is to accurately measure similarity between these two modalities. Prior …
The key of this task is to accurately measure similarity between these two modalities. Prior …
Similarity reasoning and filtration for image-text matching
Image-text matching plays a critical role in bridging the vision and language, and great
progress has been made by exploiting the global alignment between image and sentence …
progress has been made by exploiting the global alignment between image and sentence …
Dual-level representation enhancement on characteristic and context for image-text retrieval
S Yang, Q Li, W Li, X Li, AA Liu - IEEE Transactions on Circuits …, 2022 - ieeexplore.ieee.org
Image-text retrieval is a fundamental and vital task in multi-media retrieval and has received
growing attention since it connects heterogeneous data. Previous methods that perform well …
growing attention since it connects heterogeneous data. Previous methods that perform well …
Visual semantic reasoning for image-text matching
Image-text matching has been a hot research topic bridging the vision and language areas.
It remains challenging because the current representation of image usually lacks global …
It remains challenging because the current representation of image usually lacks global …
Learning the best pooling strategy for visual semantic embedding
Abstract Visual Semantic Embedding (VSE) is a dominant approach for vision-language
retrieval, which aims at learning a deep embedding space such that visual data are …
retrieval, which aims at learning a deep embedding space such that visual data are …
Multi-modality cross attention network for image and sentence matching
The key of image and sentence matching is to accurately measure the visual-semantic
similarity between an image and a sentence. However, most existing methods make use of …
similarity between an image and a sentence. However, most existing methods make use of …
Stacked cross attention for image-text matching
In this paper, we study the problem of image-text matching. Inferring the latent semantic
alignment between objects or other salient stuff (eg snow, sky, lawn) and the corresponding …
alignment between objects or other salient stuff (eg snow, sky, lawn) and the corresponding …