A review of multimodal image matching: Methods and applications

X Jiang, J Ma, G **ao, Z Shao, X Guo - Information Fusion, 2021 - Elsevier
Multimodal image matching, which refers to identifying and then corresponding the same or
similar structure/content from two or more images that are of significant modalities or …

Image-text retrieval: A survey on recent research and development

M Cao, S Li, J Li, L Nie, M Zhang - arxiv preprint arxiv:2203.14713, 2022 - arxiv.org
In the past few years, cross-modal image-text retrieval (ITR) has experienced increased
interest in the research community due to its excellent research value and broad real-world …

Similarity reasoning and filtration for image-text matching

H Diao, Y Zhang, L Ma, H Lu - Proceedings of the AAAI conference on …, 2021 - ojs.aaai.org
Image-text matching plays a critical role in bridging the vision and language, and great
progress has been made by exploiting the global alignment between image and sentence …

Towards artificial general intelligence via a multimodal foundation model

N Fei, Z Lu, Y Gao, G Yang, Y Huo, J Wen, H Lu… - Nature …, 2022 - nature.com
The fundamental goal of artificial intelligence (AI) is to mimic the core cognitive activities of
human. Despite tremendous success in the AI research, most of existing methods have only …

Probabilistic embeddings for cross-modal retrieval

S Chun, SJ Oh, RS De Rezende… - Proceedings of the …, 2021 - openaccess.thecvf.com
Cross-modal retrieval methods build a common representation space for samples from
multiple modalities, typically from the vision and the language domains. For images and …

Changer: Feature interaction is what you need for change detection

S Fang, K Li, Z Li - IEEE Transactions on Geoscience and …, 2023 - ieeexplore.ieee.org
Change detection is an important tool for long-term Earth observation missions. It takes bi-
temporal images as input and predicts “where” the change has occurred. Different from other …

Dynamic modality interaction modeling for image-text retrieval

L Qu, M Liu, J Wu, Z Gao, L Nie - … of the 44th International ACM SIGIR …, 2021 - dl.acm.org
Image-text retrieval is a fundamental and crucial branch in information retrieval. Although
much progress has been made in bridging vision and language, it remains challenging …

Deep learning for cross-domain data fusion in urban computing: Taxonomy, advances, and outlook

X Zou, Y Yan, X Hao, Y Hu, H Wen, E Liu, J Zhang… - Information …, 2025 - Elsevier
As cities continue to burgeon, Urban Computing emerges as a pivotal discipline for
sustainable development by harnessing the power of cross-domain data fusion from diverse …

Fine-grained image-text matching by cross-modal hard aligning network

Z Pan, F Wu, B Zhang - … of the IEEE/CVF conference on …, 2023 - openaccess.thecvf.com
Current state-of-the-art image-text matching methods implicitly align the visual-semantic
fragments, like regions in images and words in sentences, and adopt cross-attention …

Fine-grained semantics-aware representation enhancement for self-supervised monocular depth estimation

H Jung, E Park, S Yoo - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
Self-supervised monocular depth estimation has been widely studied, owing to its practical
importance and recent promising improvements. However, most works suffer from limited …