Cross-Modal Retrieval: A Review of Methodologies, Datasets, and Future Perspectives
With the rapid development of science and technology, all types of mixed media contain
large amounts of data. Traditional single multimedia data can no longer satisfy daily …
large amounts of data. Traditional single multimedia data can no longer satisfy daily …
Multi-modal knowledge hypergraph for diverse image retrieval
Y Zeng, Q **, T Bao, W Li - Proceedings of the AAAI Conference on …, 2023 - ojs.aaai.org
The task of keyword-based diverse image retrieval has received considerable attention due
to its wide demand in real-world scenarios. Existing methods either rely on a multi-stage re …
to its wide demand in real-world scenarios. Existing methods either rely on a multi-stage re …
Temporally language grounding with multi-modal multi-prompt tuning
Y Zeng, N Han, K Pan, Q ** - IEEE Transactions on Multimedia, 2023 - ieeexplore.ieee.org
The task of temporally language grounding (TLG), aiming to locate a video moment within
an untrimmed video that matches a given textual query, has attracted considerable research …
an untrimmed video that matches a given textual query, has attracted considerable research …
LLM-enhanced Composed Image Retrieval: An Intent Uncertainty-aware Linguistic-Visual Dual Channel Matching Model
Composed image retrieval (CoIR) involves a multi-modal query of the reference image and
modification text describing the desired changes, allowing users to express image retrieval …
modification text describing the desired changes, allowing users to express image retrieval …
Point prompt tuning for temporally language grounding
Y Zeng - Proceedings of the 45th international ACM SIGIR …, 2022 - dl.acm.org
The task of temporally language grounding (TLG) aims to locate a video moment from an
untrimmed video that match a given textual query, which has attracted considerable …
untrimmed video that match a given textual query, which has attracted considerable …
Contrastive topic-enhanced network for video captioning
In the field of video captioning, recent works usually focus on multi-modal video content
understanding, in which transcripts are extracted from speech and are often adopted as an …
understanding, in which transcripts are extracted from speech and are often adopted as an …
Probabilistic keyphrase generation from copy and generating spaces
Keyphrase generation is one of the most fundamental tasks in natural language processing
(NLP). Most existing works on keyphrase generation mainly focus on using holistic …
(NLP). Most existing works on keyphrase generation mainly focus on using holistic …
Data-driven knowledge fusion for deep multi-instance learning
Multi-instance learning (MIL) is a widely applied technique in practical applications that
involve complex data structures. MIL can be broadly categorized into two types: traditional …
involve complex data structures. MIL can be broadly categorized into two types: traditional …
Enhancing document image retrieval in education: Leveraging ensemble-based document image retrieval systems for improved precision
Document image retrieval (DIR) systems simplify access to digital data within printed
documents by capturing images. These systems act as bridges between print and digital …
documents by capturing images. These systems act as bridges between print and digital …
Globally Correlation-Aware Hard Negative Generation
Hard negative generation aims to generate informative negative samples that help to
determine the decision boundaries and thus facilitate advancing deep metric learning …
determine the decision boundaries and thus facilitate advancing deep metric learning …