Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Freevc: Towards high-quality text-free one-shot voice conversion
J Li, W Tu, L **ao - ICASSP 2023-2023 IEEE International …, 2023 - ieeexplore.ieee.org
Voice conversion (VC) can be achieved by first extracting source content information and
target speaker information, and then reconstructing waveform with these information …
target speaker information, and then reconstructing waveform with these information …
SUPERB-SG: Enhanced speech processing universal performance benchmark for semantic and generative capabilities
Transfer learning has proven to be crucial in advancing the state of speech and natural
language processing research in recent years. In speech, a model pre-trained by self …
language processing research in recent years. In speech, a model pre-trained by self …
A large-scale evaluation of speech foundation models
The foundation model paradigm leverages a shared foundation model to achieve state-of-
the-art (SOTA) performance for various tasks, requiring minimal downstream-specific data …
the-art (SOTA) performance for various tasks, requiring minimal downstream-specific data …
From discrete tokens to high-fidelity audio using multi-band diffusion
Deep generative models can generate high-fidelity audio conditioned on varioustypes of
representations (eg, mel-spectrograms, Mel-frequency Cepstral Coefficients (MFCC)) …
representations (eg, mel-spectrograms, Mel-frequency Cepstral Coefficients (MFCC)) …
Parp: Prune, adjust and re-prune for self-supervised speech recognition
Self-supervised speech representation learning (speech SSL) has demonstrated the benefit
of scale in learning rich representations for Automatic Speech Recognition (ASR) with …
of scale in learning rich representations for Automatic Speech Recognition (ASR) with …
Dddm-vc: Decoupled denoising diffusion models with disentangled representation and prior mixup for verified robust voice conversion
Diffusion-based generative models have recently exhibited powerful generative
performance. However, as many attributes exist in the data distribution and owing to several …
performance. However, as many attributes exist in the data distribution and owing to several …
Zmm-tts: Zero-shot multilingual and multispeaker speech synthesis conditioned on self-supervised discrete speech representations
Neural text-to-speech (TTS) has achieved human-like synthetic speech for single-speaker,
single-language synthesis. Multilingual TTS systems are limited to resource-rich languages …
single-language synthesis. Multilingual TTS systems are limited to resource-rich languages …
Self-supervised asr models and features for dysarthric and elderly speech recognition
Self-supervised learning (SSL) based speech foundation models have been applied to a
wide range of ASR tasks. However, their application to dysarthric and elderly speech via …
wide range of ASR tasks. However, their application to dysarthric and elderly speech via …
Efficient domain adaptation for speech foundation models
Foundation models (FMs), that are trained on broad data at scale and are adaptable to a
wide range of downstream tasks, have brought large interest in the research community …
wide range of downstream tasks, have brought large interest in the research community …
Ace-vc: Adaptive and controllable voice conversion using explicitly disentangled self-supervised speech representations
In this work, we propose a zero-shot voice conversion method using speech representations
trained with self-supervised learning. First, we develop a multi-task model to decompose a …
trained with self-supervised learning. First, we develop a multi-task model to decompose a …