Fine-grained image-text matching by cross-modal hard aligning network

Z Pan, F Wu, B Zhang - … of the IEEE/CVF conference on …, 2023 - openaccess.thecvf.com
Current state-of-the-art image-text matching methods implicitly align the visual-semantic
fragments, like regions in images and words in sentences, and adopt cross-attention …

Learning semantic relationship among instances for image-text matching

Z Fu, Z Mao, Y Song, Y Zhang - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Image-text matching, a bridge connecting image and language, is an important task, which
generally learns a holistic cross-modal embedding to achieve a high-quality semantic …

Cross-modal active complementary learning with self-refining correspondence

Y Qin, Y Sun, D Peng, JT Zhou… - Advances in Neural …, 2023 - proceedings.neurips.cc
Recently, image-text matching has attracted more and more attention from academia and
industry, which is fundamental to understanding the latent correspondence across visual …

Cross-modal semantic enhanced interaction for image-sentence retrieval

X Ge, F Chen, S Xu, F Tao… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Image-sentence retrieval has attracted extensive research attention in multimedia and
computer vision due to its promising application. The key issue lies in jointly learning the …

Interacting-enhancing feature transformer for cross-modal remote-sensing image and text retrieval

X Tang, Y Wang, J Ma, X Zhang, F Liu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Cross-modal remote-sensing image–text retrieval (CMRSITR) is a challenging topic in the
remote-sensing (RS) community. It has gained growing attention because it can be flexibly …

Quaternion relation embedding for scene graph generation

Z Wang, X Xu, G Wang, Y Yang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
As an important visual understanding task, scene graph generation has been drawing
widespread attention and could boost a broad range of downstream vision applications …

Esa: External space attention aggregation for image-text retrieval

H Zhu, C Zhang, Y Wei, S Huang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Due to the large gap between vision and language modalities, effective and efficient image-
text retrieval is still an unsolved problem. Recent progress devotes to unilaterally pursuing …

Quaternion representation learning for cross-modal matching

Z Wang, X Xu, J Wei, N **e, J Shao, Y Yang - Knowledge-Based Systems, 2023 - Elsevier
The main challenge of cross-modal matching is to construct a shared subspace reflecting
semantic closeness. Asymmetric relevance, especially the one-to-many matching case …