TF-GridNet: Integrating full-and sub-band modeling for speech separation

ZQ Wang, S Cornell, S Choi, Y Lee… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
We propose TF-GridNet for speech separation. The model is a novel deep neural network
(DNN) integrating full-and sub-band modeling in the time-frequency (TF) domain. It stacks …

TF-GridNet: Making time-frequency domain models great again for monaural speaker separation

ZQ Wang, S Cornell, S Choi, Y Lee… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
We propose TF-GridNet, a novel multi-path deep neural network (DNN) operating in the time-
frequency (TF) domain, for monaural talker-independent speaker separation in anechoic …

Espnet-codec: Comprehensive training and evaluation of neural codecs for audio, music, and speech

J Shi, J Tian, Y Wu, J Jung, JQ Yip… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org
Neural codecs have become crucial to recent speech and audio generation research. In
addition to signal compression capabilities, discrete codecs have also been found to …

Diffusion-based generative speech source separation

R Scheibler, Y Ji, SW Chung, J Byun… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
We propose DiffSep, a new single channel source separation method based on score-
matching of a stochastic differential equation (SDE). We craft a tailored continuous time …

End-to-end integration of speech recognition, dereverberation, beamforming, and self-supervised learning representation

Y Masuyama, X Chang, S Cornell… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
Self-supervised learning representation (SSLR) has demonstrated its significant
effectiveness in automatic speech recognition (ASR), mainly with clean speech. Recent work …

Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation

Y Masuyama, X Chang, W Zhang… - … IEEE Workshop on …, 2023 - ieeexplore.ieee.org
Neural speech separation has made remarkable progress and its integration with automatic
speech recognition (ASR) is an important direction towards realizing multi-speaker ASR …

[HTML][HTML] Speech generation for indigenous language education

A Pine, E Cooper, D Guzmán, E Joanis… - Computer Speech & …, 2025 - Elsevier
As the quality of contemporary speech synthesis improves, so too does the interest from
language communities in develo** text-to-speech (TTS) systems for a variety of real-world …

Multi-channel target speaker extraction with refinement: The WAVLab submission to the second clarity enhancement challenge

S Cornell, ZQ Wang, Y Masuyama, S Watanabe… - arxiv preprint arxiv …, 2023 - arxiv.org
This paper describes our submission to the Second Clarity Enhancement Challenge
(CEC2), which consists of target speech enhancement for hearing-aid (HA) devices in noisy …

LC4SV: A Denoising Framework Learning to Compensate for Unseen Speaker Verification Models

CC Lee, HW Chen, CS Chen, HM Wang… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
The performance of speaker verification (SV) models may drop dramatically in noisy
environments. A speech enhancement (SE) module can be used as a front-end strategy …

[PDF][PDF] Speech generation for indigenous language education

RK Kazantsevaa, R Kuhna, S Larkina… - Computer Speech & …, 2024 - docs.everyvoice.ca
The vast majority of the world's languages are unable to follow in the footsteps of existing
resource-intensive pathways to building text-to-speech (TTS) systems. But, as the quality of …