Fithubert: Going thinner and deeper for knowledge distillation of speech self-supervised learning
Adapting TTS models for new speakers using transfer learning
Training neural text-to-speech (TTS) models for a new speaker typically requires several
hours of high quality speech data. Prior works on voice cloning attempt to address this …
hours of high quality speech data. Prior works on voice cloning attempt to address this …
Automatic Fluency Assessment Method for Spontaneous Speech without Reference Text
J Liu, A Wumaier, C Fan, S Guo - Electronics, 2023 - mdpi.com
The automatic fluency assessment of spontaneous speech without reference text is a
challenging task that heavily depends on the accuracy of automatic speech recognition …
challenging task that heavily depends on the accuracy of automatic speech recognition …
One-Step Knowledge Distillation and Fine-Tuning in Using Large Pre-Trained Self-Supervised Learning Models for Speaker Verification
The application of speech self-supervised learning (SSL) models has achieved remarkable
performance in speaker verification (SV). However, there is a computational cost hurdle in …
performance in speaker verification (SV). However, there is a computational cost hurdle in …
[PDF][PDF] Multi-task wav2vec2 serving as a pronunciation training system for children
Computer-assisted learning tools (CAPT) are increasingly reliant on AI tools. Recent studies
demonstrated how neural systems pre-trained in a self-supervised fashion, such as …
demonstrated how neural systems pre-trained in a self-supervised fashion, such as …
SKILL: Similarity-aware Knowledge distILLation for Speech Self-Supervised Learning
Self-supervised learning (SSL) has achieved remarkable success across various speech-
processing tasks. To enhance its efficiency, previous works often leverage the use of …
processing tasks. To enhance its efficiency, previous works often leverage the use of …
SelfVC: Voice Conversion With Iterative Refinement using Self Transformations
We propose SelfVC, a training strategy to iteratively improve a voice conversion model with
self-synthesized examples. Previous efforts on voice conversion focus on explicitly …
self-synthesized examples. Previous efforts on voice conversion focus on explicitly …