Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Espnet-codec: Comprehensive training and evaluation of neural codecs for audio, music, and speech
Neural codecs have become crucial to recent speech and audio generation research. In
addition to signal compression capabilities, discrete codecs have also been found to …
addition to signal compression capabilities, discrete codecs have also been found to …
Preference tuning with human feedback on language, speech, and vision tasks: A survey
Preference tuning is a crucial process for aligning deep generative models with human
preferences. This survey offers a thorough overview of recent advancements in preference …
preferences. This survey offers a thorough overview of recent advancements in preference …
URGENT challenge: Universality, robustness, and generalizability for speech enhancement
The last decade has witnessed significant advancements in deep learning-based speech
enhancement (SE). However, most existing SE research has limitations on the coverage of …
enhancement (SE). However, most existing SE research has limitations on the coverage of …
Mos-bench: Benchmarking generalization abilities of subjective speech quality assessment models
Subjective speech quality assessment (SSQA) is critical for evaluating speech samples as
perceived by human listeners. While model-based SSQA has enjoyed great success thanks …
perceived by human listeners. While model-based SSQA has enjoyed great success thanks …
VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music
In this work, we introduce VERSA, a unified and standardized evaluation toolkit designed for
various speech, audio, and music signals. The toolkit features a Pythonic interface with …
various speech, audio, and music signals. The toolkit features a Pythonic interface with …
Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
Speech quality assessment typically requires evaluating audio from multiple aspects, such
as mean opinion score (MOS) and speaker similarity (SIM) etc., which can be challenging to …
as mean opinion score (MOS) and speaker similarity (SIM) etc., which can be challenging to …
Massively Multilingual Forced Aligner Leveraging Self-Supervised Discrete Units
We propose a massively multilingual speech-to-text neural forced aligner that supports 98
languages with a single architecture. The aligner takes self-supervised discrete acoustic …
languages with a single architecture. The aligner takes self-supervised discrete acoustic …
Deep Speech Synthesis from Multimodal Articulatory Representations
The amount of articulatory data available for training deep learning models is much less
compared to acoustic speech data. In order to improve articulatory-to-acoustic synthesis …
compared to acoustic speech data. In order to improve articulatory-to-acoustic synthesis …
AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement
We introduce AnyEnhance, a unified generative model for voice enhancement that
processes both speech and singing voices. Based on a masked generative model …
processes both speech and singing voices. Based on a masked generative model …
BnTTS: Few-Shot Speaker Adaptation in Low-Resource Setting
This paper introduces BnTTS (Bangla Text-To-Speech), the first framework for Bangla
speaker adaptation-based TTS, designed to bridge the gap in Bangla speech synthesis …
speaker adaptation-based TTS, designed to bridge the gap in Bangla speech synthesis …