Fine-grained image-text matching by cross-modal hard aligning network
Z Pan, F Wu, B Zhang - … of the IEEE/CVF conference on …, 2023 - openaccess.thecvf.com
Current state-of-the-art image-text matching methods implicitly align the visual-semantic
fragments, like regions in images and words in sentences, and adopt cross-attention …
fragments, like regions in images and words in sentences, and adopt cross-attention …
Learning semantic relationship among instances for image-text matching
Image-text matching, a bridge connecting image and language, is an important task, which
generally learns a holistic cross-modal embedding to achieve a high-quality semantic …
generally learns a holistic cross-modal embedding to achieve a high-quality semantic …
Cross-modal active complementary learning with self-refining correspondence
Recently, image-text matching has attracted more and more attention from academia and
industry, which is fundamental to understanding the latent correspondence across visual …
industry, which is fundamental to understanding the latent correspondence across visual …
Cross-modal semantic enhanced interaction for image-sentence retrieval
Image-sentence retrieval has attracted extensive research attention in multimedia and
computer vision due to its promising application. The key issue lies in jointly learning the …
computer vision due to its promising application. The key issue lies in jointly learning the …
Interacting-enhancing feature transformer for cross-modal remote-sensing image and text retrieval
Cross-modal remote-sensing image–text retrieval (CMRSITR) is a challenging topic in the
remote-sensing (RS) community. It has gained growing attention because it can be flexibly …
remote-sensing (RS) community. It has gained growing attention because it can be flexibly …
Quaternion relation embedding for scene graph generation
As an important visual understanding task, scene graph generation has been drawing
widespread attention and could boost a broad range of downstream vision applications …
widespread attention and could boost a broad range of downstream vision applications …
Esa: External space attention aggregation for image-text retrieval
Due to the large gap between vision and language modalities, effective and efficient image-
text retrieval is still an unsolved problem. Recent progress devotes to unilaterally pursuing …
text retrieval is still an unsolved problem. Recent progress devotes to unilaterally pursuing …
Quaternion representation learning for cross-modal matching
The main challenge of cross-modal matching is to construct a shared subspace reflecting
semantic closeness. Asymmetric relevance, especially the one-to-many matching case …
semantic closeness. Asymmetric relevance, especially the one-to-many matching case …