A review of speaker diarization: Recent advances with deep learning
Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …
Sixty years of frequency-domain monaural speech enhancement: From traditional to deep learning methods
Frequency-domain monaural speech enhancement has been extensively studied for over
60 years, and a great number of methods have been proposed and applied to many …
60 years, and a great number of methods have been proposed and applied to many …
TF-GridNet: Integrating full-and sub-band modeling for speech separation
We propose TF-GridNet for speech separation. The model is a novel deep neural network
(DNN) integrating full-and sub-band modeling in the time-frequency (TF) domain. It stacks …
(DNN) integrating full-and sub-band modeling in the time-frequency (TF) domain. It stacks …
CHiME-6 challenge: Tackling multispeaker speech recognition for unsegmented recordings
Following the success of the 1st, 2nd, 3rd, 4th and 5th CHiME challenges we organize the
6th CHiME Speech Separation and Recognition Challenge (CHiME-6). The new challenge …
6th CHiME Speech Separation and Recognition Challenge (CHiME-6). The new challenge …
[HTML][HTML] Machine learning in acoustics: Theory and applications
Acoustic data provide scientific and engineering insights in fields ranging from biology and
communications to ocean and Earth science. We survey the recent advances and …
communications to ocean and Earth science. We survey the recent advances and …
A survey on audio diffusion models: Text to speech synthesis and enhancement in generative ai
Generative AI has demonstrated impressive performance in various fields, among which
speech synthesis is an interesting direction. With the diffusion model as the most popular …
speech synthesis is an interesting direction. With the diffusion model as the most popular …
Gibbsddrm: A partially collapsed gibbs sampler for solving blind inverse problems with denoising diffusion restoration
Pre-trained diffusion models have been successfully used as priors in a variety of linear
inverse problems, where the goal is to reconstruct a signal from noisy linear measurements …
inverse problems, where the goal is to reconstruct a signal from noisy linear measurements …
HiFi-GAN: High-fidelity denoising and dereverberation based on speech deep features in adversarial networks
Real-world audio recordings are often degraded by factors such as noise, reverberation,
and equalization distortion. This paper introduces HiFi-GAN, a deep learning method to …
and equalization distortion. This paper introduces HiFi-GAN, a deep learning method to …
SpatialNet: Extensively learning spatial information for multichannel joint speech separation, denoising and dereverberation
This work proposes a neural network to extensively exploit spatial information for
multichannel joint speech separation, denoising and dereverberation, named SpatialNet. In …
multichannel joint speech separation, denoising and dereverberation, named SpatialNet. In …
Multichannel long-term streaming neural speech enhancement for static and moving speakers
In this work, we extend our previously proposed offline SpatialNet for long-term streaming
multichannel speech enhancement in both static and moving speaker scenarios. SpatialNet …
multichannel speech enhancement in both static and moving speaker scenarios. SpatialNet …