Cross-Modal Retrieval: A Review of Methodologies, Datasets, and Future Perspectives

Z Han, A Azman, MR Mustaffa, FB Khalid - IEEE Access, 2024 - ieeexplore.ieee.org
With the rapid development of science and technology, all types of mixed media contain
large amounts of data. Traditional single multimedia data can no longer satisfy daily …

Multi-modal knowledge hypergraph for diverse image retrieval

Y Zeng, Q **, T Bao, W Li - Proceedings of the AAAI Conference on …, 2023 - ojs.aaai.org
The task of keyword-based diverse image retrieval has received considerable attention due
to its wide demand in real-world scenarios. Existing methods either rely on a multi-stage re …

Temporally language grounding with multi-modal multi-prompt tuning

Y Zeng, N Han, K Pan, Q ** - IEEE Transactions on Multimedia, 2023 - ieeexplore.ieee.org
The task of temporally language grounding (TLG), aiming to locate a video moment within
an untrimmed video that matches a given textual query, has attracted considerable research …

LLM-enhanced Composed Image Retrieval: An Intent Uncertainty-aware Linguistic-Visual Dual Channel Matching Model

H Ge, Y Jiang, J Sun, K Yuan, Y Liu - ACM Transactions on Information …, 2024 - dl.acm.org
Composed image retrieval (CoIR) involves a multi-modal query of the reference image and
modification text describing the desired changes, allowing users to express image retrieval …

Point prompt tuning for temporally language grounding

Y Zeng - Proceedings of the 45th international ACM SIGIR …, 2022 - dl.acm.org
The task of temporally language grounding (TLG) aims to locate a video moment from an
untrimmed video that match a given textual query, which has attracted considerable …

Contrastive topic-enhanced network for video captioning

Y Zeng, Y Wang, D Liao, G Li, J Xu, H Man… - Expert Systems with …, 2024 - Elsevier
In the field of video captioning, recent works usually focus on multi-modal video content
understanding, in which transcripts are extracted from speech and are often adopted as an …

Probabilistic keyphrase generation from copy and generating spaces

Y Yao, P Yang, G Zhao, Y Ge… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Keyphrase generation is one of the most fundamental tasks in natural language processing
(NLP). Most existing works on keyphrase generation mainly focus on using holistic …

Data-driven knowledge fusion for deep multi-instance learning

YX Zhang, Z Zhou, X He, AR Adhikary… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Multi-instance learning (MIL) is a widely applied technique in practical applications that
involve complex data structures. MIL can be broadly categorized into two types: traditional …

Enhancing document image retrieval in education: Leveraging ensemble-based document image retrieval systems for improved precision

YI Alzoubi, AE Topcu, E Ozdemir - Applied Sciences, 2024 - mdpi.com
Document image retrieval (DIR) systems simplify access to digital data within printed
documents by capturing images. These systems act as bridges between print and digital …

Globally Correlation-Aware Hard Negative Generation

W Peng, H Huang, T Chen, Q Ke, G Dai… - International Journal of …, 2024 - Springer
Hard negative generation aims to generate informative negative samples that help to
determine the decision boundaries and thus facilitate advancing deep metric learning …