Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Cross-modal retrieval: a systematic review of methods and future directions
With the exponential surge in diverse multimodal data, traditional unimodal retrieval
methods struggle to meet the needs of users seeking access to data across various …
methods struggle to meet the needs of users seeking access to data across various …
Automated audio captioning: An overview of recent progress and new challenges
Automated audio captioning is a cross-modal translation task that aims to generate natural
language descriptions for given audio clips. This task has received increasing attention with …
language descriptions for given audio clips. This task has received increasing attention with …
Large-scale contrastive language-audio pretraining with feature fusion and keyword-to-caption augmentation
Contrastive learning has shown remarkable success in the field of multimodal
representation learning. In this paper, we propose a pipeline of contrastive language-audio …
representation learning. In this paper, we propose a pipeline of contrastive language-audio …
Wavcaps: A chatgpt-assisted weakly-labelled audio captioning dataset for audio-language multimodal research
The advancement of audio-language (AL) multimodal learning tasks has been significant in
recent years, yet the limited size of existing audio-language datasets poses challenges for …
recent years, yet the limited size of existing audio-language datasets poses challenges for …
One-peace: Exploring one general representation model toward unlimited modalities
In this work, we explore a scalable way for building a general representation model toward
unlimited modalities. We release ONE-PEACE, a highly extensible model with 4B …
unlimited modalities. We release ONE-PEACE, a highly extensible model with 4B …
Separate anything you describe
Language-queried audio source separation (LASS) is a new paradigm for computational
auditory scene analysis (CASA). LASS aims to separate a target sound from an audio …
auditory scene analysis (CASA). LASS aims to separate a target sound from an audio …
Improving text-audio retrieval by text-aware attention pooling and prior matrix revised loss
In text-audio retrieval (TAR) tasks, due to the heterogeneity of contents between text and
audio, the semantic information contained in the text is only similar to certain frames within …
audio, the semantic information contained in the text is only similar to certain frames within …
Audio retrieval with wavtext5k and clap training
Audio-Text retrieval takes a natural language query to retrieve relevant audio files in a
database. Conversely, Text-Audio retrieval takes an audio file as a query to retrieve relevant …
database. Conversely, Text-Audio retrieval takes an audio file as a query to retrieve relevant …
Modality-independent teachers meet weakly-supervised audio-visual event parser
Audio-visual learning has been a major pillar of multi-modal machine learning, where the
community mostly focused on its $\textit {modality-aligned} $ setting, $\textit {ie} $, the audio …
community mostly focused on its $\textit {modality-aligned} $ setting, $\textit {ie} $, the audio …
Flap: Fast language-audio pre-training
We propose Fast Language-Audio Pre-training (FLAP), a self-supervised approach that
efficiently and effectively learns aligned audio and language representations through …
efficiently and effectively learns aligned audio and language representations through …