A review of multimodal image matching: Methods and applications
Multimodal image matching, which refers to identifying and then corresponding the same or
similar structure/content from two or more images that are of significant modalities or …
similar structure/content from two or more images that are of significant modalities or …
Image-text retrieval: A survey on recent research and development
In the past few years, cross-modal image-text retrieval (ITR) has experienced increased
interest in the research community due to its excellent research value and broad real-world …
interest in the research community due to its excellent research value and broad real-world …
Similarity reasoning and filtration for image-text matching
Image-text matching plays a critical role in bridging the vision and language, and great
progress has been made by exploiting the global alignment between image and sentence …
progress has been made by exploiting the global alignment between image and sentence …
Towards artificial general intelligence via a multimodal foundation model
The fundamental goal of artificial intelligence (AI) is to mimic the core cognitive activities of
human. Despite tremendous success in the AI research, most of existing methods have only …
human. Despite tremendous success in the AI research, most of existing methods have only …
Probabilistic embeddings for cross-modal retrieval
Cross-modal retrieval methods build a common representation space for samples from
multiple modalities, typically from the vision and the language domains. For images and …
multiple modalities, typically from the vision and the language domains. For images and …
Changer: Feature interaction is what you need for change detection
S Fang, K Li, Z Li - IEEE Transactions on Geoscience and …, 2023 - ieeexplore.ieee.org
Change detection is an important tool for long-term Earth observation missions. It takes bi-
temporal images as input and predicts “where” the change has occurred. Different from other …
temporal images as input and predicts “where” the change has occurred. Different from other …
Dynamic modality interaction modeling for image-text retrieval
Image-text retrieval is a fundamental and crucial branch in information retrieval. Although
much progress has been made in bridging vision and language, it remains challenging …
much progress has been made in bridging vision and language, it remains challenging …
Deep learning for cross-domain data fusion in urban computing: Taxonomy, advances, and outlook
As cities continue to burgeon, Urban Computing emerges as a pivotal discipline for
sustainable development by harnessing the power of cross-domain data fusion from diverse …
sustainable development by harnessing the power of cross-domain data fusion from diverse …
Fine-grained image-text matching by cross-modal hard aligning network
Z Pan, F Wu, B Zhang - … of the IEEE/CVF conference on …, 2023 - openaccess.thecvf.com
Current state-of-the-art image-text matching methods implicitly align the visual-semantic
fragments, like regions in images and words in sentences, and adopt cross-attention …
fragments, like regions in images and words in sentences, and adopt cross-attention …
Fine-grained semantics-aware representation enhancement for self-supervised monocular depth estimation
Self-supervised monocular depth estimation has been widely studied, owing to its practical
importance and recent promising improvements. However, most works suffer from limited …
importance and recent promising improvements. However, most works suffer from limited …