An overview of cross-media retrieval: Concepts, methodologies, benchmarks, and challenges

Y Peng, X Huang, Y Zhao - … on circuits and systems for video …, 2017 - ieeexplore.ieee.org
Multimedia retrieval plays an indispensable role in big data utilization. Past efforts mainly
focused on single-media retrieval. However, the requirements of users are highly flexible …

Comparative analysis on cross-modal information retrieval: A review

P Kaur, HS Pannu, AK Malhi - Computer Science Review, 2021 - Elsevier
Human beings experience life through a spectrum of modes such as vision, taste, hearing,
smell, and touch. These multiple modes are integrated for information processing in our …

Deep multimodal representation learning: A survey

W Guo, J Wang, S Wang - Ieee Access, 2019 - ieeexplore.ieee.org
Multimodal representation learning, which aims to narrow the heterogeneity gap among
different modalities, plays an indispensable role in the utilization of ubiquitous multimodal …

Dual-path convolutional image-text embeddings with instance loss

Z Zheng, L Zheng, M Garrett, Y Yang, M Xu… - ACM Transactions on …, 2020 - dl.acm.org
Matching images and sentences demands a fine understanding of both modalities. In this
article, we propose a new system to discriminatively embed the image and text to a shared …

Predicting visual features from text for image and video caption retrieval

J Dong, X Li, CGM Snoek - IEEE Transactions on Multimedia, 2018 - ieeexplore.ieee.org
This paper strives to find amidst a set of sentences the one best describing the content of a
given image or video. Different from existing works, which rely on a joint subspace for their …

Know more say less: Image captioning based on scene graphs

X Li, S Jiang - IEEE Transactions on Multimedia, 2019 - ieeexplore.ieee.org
Automatically describing the content of an image has been attracting considerable research
attention in the multimedia field. To represent the content of an image, many approaches …

Modality-specific cross-modal similarity measurement with recurrent attention network

Y Peng, J Qi, Y Yuan - IEEE Transactions on Image Processing, 2018 - ieeexplore.ieee.org
Nowadays, cross-modal retrieval plays an important role to flexibly find useful information
across different modalities of data. Effectively measuring the similarity between different …

Multitask learning for cross-domain image captioning

M Yang, W Zhao, W Xu, Y Feng, Z Zhao… - IEEE Transactions …, 2018 - ieeexplore.ieee.org
Recent artificial intelligence research has witnessed great interest in automatically
generating text descriptions of images, which are known as the image captioning task …

Information fusion in content based image retrieval: A comprehensive overview

L Piras, G Giacinto - Information Fusion, 2017 - Elsevier
An ever increasing part of communication between persons involve the use of pictures, due
to the cheap availability of powerful cameras on smartphones, and the cheap availability of …

SCH-GAN: Semi-supervised cross-modal hashing by generative adversarial network

J Zhang, Y Peng, M Yuan - IEEE transactions on cybernetics, 2018 - ieeexplore.ieee.org
Cross-modal hashing maps heterogeneous multimedia data into a common Hamming
space to realize fast and flexible cross-modal retrieval. Supervised cross-modal hashing …