Unifying knowledge iterative dissemination and relational reconstruction network for image–text matching

X **e, Z Li, Z Tang, D Yao, H Ma - Information Processing & Management, 2023 - Elsevier
Image–text matching is a crucial branch in multimedia retrieval which relies on learning inter-
modal correspondences. Most existing methods focus on global or local correspondence …

Heterogeneous Graph Fusion Network for cross-modal image-text retrieval

X Qin, L Li, G Pang, F Hao - Expert Systems with Applications, 2024 - Elsevier
Exploring the semantic correspondence of image-text pairs is significant as it bridges vision
and language. Most prior works focus on global semantic alignment or local semantic …

Multi-level knowledge-driven feature representation and triplet loss optimization network for image–text retrieval

X Qin, L Li, F Hao, M Ge, G Pang - Information Processing & Management, 2024 - Elsevier
Image–text retrieval plays a considerable role in associating vision and language. Existing
mainstream approaches focus on fine-grained alignment while ignoring the influence of …

Bridging the cross-modality semantic gap in visual question answering

B Wang, Y Ma, X Li, J Gao, Y Hu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
The objective of visual question answering (VQA) is to adequately comprehend a question
and identify relevant contents in an image that can provide an answer. Existing approaches …

Multi-level Symmetric Semantic Alignment Network for image–text matching

W Wang, X Di, M Liu, F Gao - Neurocomputing, 2024 - Elsevier
Image–text matching has attracted much attention as one of the visual-linguistic tasks. Most
of the existing methods tend to concentrate on single-level semantic similarity by global …

Multi-scale motivated neural network for image-text matching

X Qin, L Li, G Pang - Multimedia Tools and Applications, 2024 - Springer
Existing mainstream image-text matching methods usually measure the relevance of image-
text pairs by capturing and aggregating the affinities between textual words and visual …

Multi-task visual semantic embedding network for image-text retrieval

XY Qin, LS Li, JY Tang, F Hao, ML Ge… - Journal of Computer …, 2024 - Springer
Image-text retrieval aims to capture the semantic correspondence between images and
texts, which serves as a foundation and crucial component in multi-modal recommendations …

Global-guided asymmetric attention network for image-text matching

D Wu, H Li, Y Tang, L Guo, H Liu - Neurocomputing, 2022 - Elsevier
Image-text matching is a vital yet challenging task in the field of vision and language. Unlike
previous methods that usually adopt a symmetrical network to independently embed images …

Cross-modal information balance-aware reasoning network for image-text retrieval

X Qin, L Li, F Hao, G Pang, Z Wang - Engineering Applications of Artificial …, 2023 - Elsevier
As a fundamental multimodal task, image-text retrieval bridges the gap between vision and
language. Current mainstream methods exploit attention mechanisms to discover potential …

Visual Contextual Semantic Reasoning for Cross-Modal Drone Image-Text Retrieval

J Huang, Y Chen, S **ong, X Lu - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
The cross-modal drone image-text (DIT) retrieval task involves using either text or drone
images as queries to retrieve relevant drone images or corresponding text. The primary …