TF-GridNet: Integrating full-and sub-band modeling for speech separation
We propose TF-GridNet for speech separation. The model is a novel deep neural network
(DNN) integrating full-and sub-band modeling in the time-frequency (TF) domain. It stacks …
(DNN) integrating full-and sub-band modeling in the time-frequency (TF) domain. It stacks …
TF-GridNet: Making time-frequency domain models great again for monaural speaker separation
We propose TF-GridNet, a novel multi-path deep neural network (DNN) operating in the time-
frequency (TF) domain, for monaural talker-independent speaker separation in anechoic …
frequency (TF) domain, for monaural talker-independent speaker separation in anechoic …
Espnet-codec: Comprehensive training and evaluation of neural codecs for audio, music, and speech
Neural codecs have become crucial to recent speech and audio generation research. In
addition to signal compression capabilities, discrete codecs have also been found to …
addition to signal compression capabilities, discrete codecs have also been found to …
Diffusion-based generative speech source separation
We propose DiffSep, a new single channel source separation method based on score-
matching of a stochastic differential equation (SDE). We craft a tailored continuous time …
matching of a stochastic differential equation (SDE). We craft a tailored continuous time …
End-to-end integration of speech recognition, dereverberation, beamforming, and self-supervised learning representation
Self-supervised learning representation (SSLR) has demonstrated its significant
effectiveness in automatic speech recognition (ASR), mainly with clean speech. Recent work …
effectiveness in automatic speech recognition (ASR), mainly with clean speech. Recent work …
Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation
Neural speech separation has made remarkable progress and its integration with automatic
speech recognition (ASR) is an important direction towards realizing multi-speaker ASR …
speech recognition (ASR) is an important direction towards realizing multi-speaker ASR …
[HTML][HTML] Speech generation for indigenous language education
As the quality of contemporary speech synthesis improves, so too does the interest from
language communities in develo** text-to-speech (TTS) systems for a variety of real-world …
language communities in develo** text-to-speech (TTS) systems for a variety of real-world …
Multi-channel target speaker extraction with refinement: The WAVLab submission to the second clarity enhancement challenge
This paper describes our submission to the Second Clarity Enhancement Challenge
(CEC2), which consists of target speech enhancement for hearing-aid (HA) devices in noisy …
(CEC2), which consists of target speech enhancement for hearing-aid (HA) devices in noisy …
LC4SV: A Denoising Framework Learning to Compensate for Unseen Speaker Verification Models
The performance of speaker verification (SV) models may drop dramatically in noisy
environments. A speech enhancement (SE) module can be used as a front-end strategy …
environments. A speech enhancement (SE) module can be used as a front-end strategy …
[PDF][PDF] Speech generation for indigenous language education
RK Kazantsevaa, R Kuhna, S Larkina… - Computer Speech & …, 2024 - docs.everyvoice.ca
The vast majority of the world's languages are unable to follow in the footsteps of existing
resource-intensive pathways to building text-to-speech (TTS) systems. But, as the quality of …
resource-intensive pathways to building text-to-speech (TTS) systems. But, as the quality of …