Cross-modal retrieval: a systematic review of methods and future directions
With the exponential surge in diverse multimodal data, traditional unimodal retrieval
methods struggle to meet the needs of users seeking access to data across various …
methods struggle to meet the needs of users seeking access to data across various …
A survey of audio-based music classification and annotation
Music information retrieval (MIR) is an emerging research area that receives growing
attention from both the research community and music industry. It addresses the problem of …
attention from both the research community and music industry. It addresses the problem of …
Use what you have: Video retrieval using representations from collaborative experts
The rapid growth of video on the internet has made searching for video content using natural
language queries a significant challenge. Human-generated queries for video datasetsin the …
language queries a significant challenge. Human-generated queries for video datasetsin the …
Learning audio-video modalities from image captions
There has been a recent explosion of large-scale image-text datasets, as images with alt-
text captions can be easily obtained online. Obtaining large-scale, high quality data for video …
text captions can be easily obtained online. Obtaining large-scale, high quality data for video …
Audio retrieval with natural language queries: A benchmark study
The objectives of this work are cross-modal text-audio and audio-text retrieval, in which the
goal is to retrieve the audio content from a pool of candidates that best matches a given …
goal is to retrieve the audio content from a pool of candidates that best matches a given …
Deepear: robust smartphone audio sensing in unconstrained acoustic environments using deep learning
Microphones are remarkably powerful sensors of human behavior and context. However,
audio sensing is highly susceptible to wild fluctuations in accuracy when used in diverse …
audio sensing is highly susceptible to wild fluctuations in accuracy when used in diverse …
Robust sound event classification using deep neural networks
The automatic recognition of sound events by computers is an important aspect of emerging
applications such as automated surveillance, machine hearing and auditory scene …
applications such as automated surveillance, machine hearing and auditory scene …
Audio retrieval with natural language queries
We consider the task of retrieving audio using free-form natural language queries. To study
this problem, which has received limited attention in the existing literature, we introduce …
this problem, which has received limited attention in the existing literature, we introduce …
Improving cross-modal retrieval with set of diverse embeddings
Cross-modal retrieval across image and text modalities is a challenging task due to its
inherent ambiguity: An image often exhibits various situations, and a caption can be coupled …
inherent ambiguity: An image often exhibits various situations, and a caption can be coupled …
On metric learning for audio-text cross-modal retrieval
Audio-text retrieval aims at retrieving a target audio clip or caption from a pool of candidates
given a query in another modality. Solving such cross-modal retrieval task is challenging …
given a query in another modality. Solving such cross-modal retrieval task is challenging …