Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Computational bioacoustics with deep learning: a review and roadmap
D Stowell - PeerJ, 2022 - peerj.com
Animal vocalisations and natural soundscapes are fascinating objects of study, and contain
valuable evidence about animal behaviours, populations and ecosystems. They are studied …
valuable evidence about animal behaviours, populations and ecosystems. They are studied …
Human action recognition from various data modalities: A review
Human Action Recognition (HAR) aims to understand human behavior and assign a label to
each action. It has a wide range of applications, and therefore has been attracting increasing …
each action. It has a wide range of applications, and therefore has been attracting increasing …
Audioldm: Text-to-audio generation with latent diffusion models
Text-to-audio (TTA) system has recently gained attention for its ability to synthesize general
audio based on text descriptions. However, previous studies in TTA have limited generation …
audio based on text descriptions. However, previous studies in TTA have limited generation …
Videopoet: A large language model for zero-shot video generation
We present VideoPoet, a language model capable of synthesizing high-quality video, with
matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder …
matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder …
Retrieval-augmented generation for ai-generated content: A survey
The development of Artificial Intelligence Generated Content (AIGC) has been facilitated by
advancements in model algorithms, scalable foundation model architectures, and the …
advancements in model algorithms, scalable foundation model architectures, and the …
Noise2music: Text-conditioned music generation with diffusion models
We introduce Noise2Music, where a series of diffusion models is trained to generate high-
quality 30-second music clips from text prompts. Two types of diffusion models, a generator …
quality 30-second music clips from text prompts. Two types of diffusion models, a generator …
Audioldm 2: Learning holistic audio generation with self-supervised pretraining
Although audio generation shares commonalities across different types of audio, such as
speech, music, and sound effects, designing models for each type requires careful …
speech, music, and sound effects, designing models for each type requires careful …
Masked autoencoders that listen
This paper studies a simple extension of image-based Masked Autoencoders (MAE) to self-
supervised representation learning from audio spectrograms. Following the Transformer …
supervised representation learning from audio spectrograms. Following the Transformer …
Fast timing-conditioned latent audio diffusion
Generating long-form 44.1 kHz stereo audio from text prompts can be computationally
demanding. Further, most previous works do not tackle that music and sound effects …
demanding. Further, most previous works do not tackle that music and sound effects …
Attention bottlenecks for multimodal fusion
Humans perceive the world by concurrently processing and fusing high-dimensional inputs
from multiple modalities such as vision and audio. Machine perception models, in stark …
from multiple modalities such as vision and audio. Machine perception models, in stark …