Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Moshi: a speech-text foundation model for real-time dialogue
We introduce Moshi, a speech-text foundation model and full-duplex spoken dialogue
framework. Current systems for spoken dialogue rely on pipelines of independent …
framework. Current systems for spoken dialogue rely on pipelines of independent …
[HTML][HTML] Video and audio deepfake datasets and open issues in deepfake technology: being ahead of the curve
Z Akhtar, TL Pendyala, VS Athmakuri - Forensic Sciences, 2024 - mdpi.com
The revolutionary breakthroughs in Machine Learning (ML) and Artificial Intelligence (AI) are
extensively being harnessed across a diverse range of domains, eg, forensic science …
extensively being harnessed across a diverse range of domains, eg, forensic science …
Masked generative video-to-audio transformers with enhanced synchronicity
Abstract Video-to-audio (V2A) generation leverages visual-only video features to render
plausible sounds that match the scene. Importantly, the generated sound onsets should …
plausible sounds that match the scene. Importantly, the generated sound onsets should …
Fireredtts: A foundation text-to-speech framework for industry-level generative speech applications
This work proposes FireRedTTS, a foundation text-to-speech framework, to meet the
growing demands for personalized and diverse generative speech applications. The …
growing demands for personalized and diverse generative speech applications. The …
Hierspeech++: Bridging the gap between semantic and acoustic representation of speech by hierarchical variational inference for zero-shot speech synthesis
Large language models (LLM)-based speech synthesis has been widely adopted in zero-
shot speech synthesis. However, they require a large-scale data and possess the same …
shot speech synthesis. However, they require a large-scale data and possess the same …
Audio Super-Resolution With Robust Speech Representation Learning of Masked Autoencoder
This paper proposes Fre-Painter, a high-fidelity audio super-resolution system that utilizes
robust speech representation learning with various masking strategies. Recently, masked …
robust speech representation learning with various masking strategies. Recently, masked …
Specmaskgit: Masked generative modeling of audio spectrograms for efficient audio synthesis and beyond
Recent advances in generative models that iteratively synthesize audio clips sparked great
success to text-to-audio synthesis (TTA), but with the cost of slow synthesis speed and heavy …
success to text-to-audio synthesis (TTA), but with the cost of slow synthesis speed and heavy …
MusicHiFi: Fast high-fidelity stereo vocoding
Diffusion-based audio and music generation models commonly generate music by
constructing an image representation of audio (eg, a mel-spectrogram) and then converting …
constructing an image representation of audio (eg, a mel-spectrogram) and then converting …
Wave-u-mamba: an end-to-end framework for high-quality and efficient speech super resolution
Y Lee, C Kim - arxiv preprint arxiv:2409.09337, 2024 - arxiv.org
Speech Super-Resolution (SSR) is a task of enhancing low-resolution speech signals by
restoring missing high-frequency components. Conventional approaches typically …
restoring missing high-frequency components. Conventional approaches typically …
Vibravox: A Dataset of French Speech Captured with Body-conduction Audio Sensors
Vibravox is a dataset compliant with the General Data Protection Regulation (GDPR)
containing audio recordings using five different body-conduction audio sensors: two in-ear …
containing audio recordings using five different body-conduction audio sensors: two in-ear …