Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Wavcaps: A chatgpt-assisted weakly-labelled audio captioning dataset for audio-language multimodal research
The advancement of audio-language (AL) multimodal learning tasks has been significant in
recent years, yet the limited size of existing audio-language datasets poses challenges for …
recent years, yet the limited size of existing audio-language datasets poses challenges for …
Audio-Language Datasets of Scenes and Events: A Survey
Audio-language models (ALMs) generate linguistic descriptions of sound-producing events
and scenes. Advances in dataset creation and computational power have led to significant …
and scenes. Advances in dataset creation and computational power have led to significant …
M2D-CLAP: Masked Modeling Duo Meets CLAP for Learning General-purpose Audio-Language Representation
Contrastive language-audio pre-training (CLAP) enables zero-shot (ZS) inference of audio
and exhibits promising performance in several classification tasks. However, conventional …
and exhibits promising performance in several classification tasks. However, conventional …
T-CLAP: Temporal-enhanced contrastive language-audio pretraining
Contrastive language-audio pretraining (CLAP) has been developed to align the
representations of audio and language, achieving remarkable performance in retrieval and …
representations of audio and language, achieving remarkable performance in retrieval and …
Multi-Sentence Grounding for Long-term Instructional Video
In this paper, we aim to establish an automatic, scalable pipeline for denoising the large-
scale instructional dataset and construct a high-quality video-text dataset with multiple …
scale instructional dataset and construct a high-quality video-text dataset with multiple …
STA-V2A: Video-to-audio generation with semantic and temporal alignment
Visual and auditory perception are two crucial ways humans experience the world. Text-to-
video generation has made remarkable progress over the past year, but the absence of …
video generation has made remarkable progress over the past year, but the absence of …
Sound-VECaps: Improving Audio Generation With Visual Enhanced Captions
Generative models have shown significant achievements in audio generation tasks.
However, existing models struggle with complex and detailed prompts, leading to potential …
However, existing models struggle with complex and detailed prompts, leading to potential …
Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding
Large language models (LLMs) have demonstrated remarkable capabilities in natural
language and multimodal domains. By fine-tuning multimodal LLMs with temporal …
language and multimodal domains. By fine-tuning multimodal LLMs with temporal …
SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context
J Li, S Tao, Y Yan, X Gu, H Xu, X Zheng, Y Lyu… - arxiv preprint arxiv …, 2024 - arxiv.org
Endeavors have been made to explore Large Language Models for video analysis (Video-
LLMs), particularly in understanding and interpreting long videos. However, existing Video …
LLMs), particularly in understanding and interpreting long videos. However, existing Video …
Audio Captioning via Generative Pair-to-Pair Retrieval with Refined Knowledge Base
C Changin, L Sungjun, R Wonjong - arxiv preprint arxiv:2410.10913, 2024 - arxiv.org
Recent advances in audio understanding tasks leverage the reasoning capabilities of LLMs.
However, adapting LLMs to learn audio concepts requires massive training data and …
However, adapting LLMs to learn audio concepts requires massive training data and …