A review of speaker diarization: Recent advances with deep learning

TJ Park, N Kanda, D Dimitriadis, KJ Han… - Computer Speech & …, 2022 - Elsevier
Speaker diarization is a task to label audio or video recordings with classes that correspond
to speaker identity, or in short, a task to identify “who spoke when”. In the early years …

Sixty years of frequency-domain monaural speech enhancement: From traditional to deep learning methods

C Zheng, H Zhang, W Liu, X Luo, A Li, X Li… - Trends in …, 2023 - journals.sagepub.com
Frequency-domain monaural speech enhancement has been extensively studied for over
60 years, and a great number of methods have been proposed and applied to many …

TF-GridNet: Integrating full-and sub-band modeling for speech separation

ZQ Wang, S Cornell, S Choi, Y Lee… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
We propose TF-GridNet for speech separation. The model is a novel deep neural network
(DNN) integrating full-and sub-band modeling in the time-frequency (TF) domain. It stacks …

CHiME-6 challenge: Tackling multispeaker speech recognition for unsegmented recordings

S Watanabe, M Mandel, J Barker, E Vincent… - arxiv preprint arxiv …, 2020 - arxiv.org
Following the success of the 1st, 2nd, 3rd, 4th and 5th CHiME challenges we organize the
6th CHiME Speech Separation and Recognition Challenge (CHiME-6). The new challenge …

[HTML][HTML] Machine learning in acoustics: Theory and applications

MJ Bianco, P Gerstoft, J Traer, E Ozanich… - The Journal of the …, 2019 - pubs.aip.org
Acoustic data provide scientific and engineering insights in fields ranging from biology and
communications to ocean and Earth science. We survey the recent advances and …

A survey on audio diffusion models: Text to speech synthesis and enhancement in generative ai

C Zhang, C Zhang, S Zheng, M Zhang… - arxiv preprint arxiv …, 2023 - arxiv.org
Generative AI has demonstrated impressive performance in various fields, among which
speech synthesis is an interesting direction. With the diffusion model as the most popular …

Gibbsddrm: A partially collapsed gibbs sampler for solving blind inverse problems with denoising diffusion restoration

N Murata, K Saito, CH Lai, Y Takida… - International …, 2023 - proceedings.mlr.press
Pre-trained diffusion models have been successfully used as priors in a variety of linear
inverse problems, where the goal is to reconstruct a signal from noisy linear measurements …

HiFi-GAN: High-fidelity denoising and dereverberation based on speech deep features in adversarial networks

J Su, Z **, A Finkelstein - arxiv preprint arxiv:2006.05694, 2020 - arxiv.org
Real-world audio recordings are often degraded by factors such as noise, reverberation,
and equalization distortion. This paper introduces HiFi-GAN, a deep learning method to …

SpatialNet: Extensively learning spatial information for multichannel joint speech separation, denoising and dereverberation

C Quan, X Li - IEEE/ACM Transactions on Audio, Speech, and …, 2024 - ieeexplore.ieee.org
This work proposes a neural network to extensively exploit spatial information for
multichannel joint speech separation, denoising and dereverberation, named SpatialNet. In …

Multichannel long-term streaming neural speech enhancement for static and moving speakers

C Quan, X Li - IEEE Signal Processing Letters, 2024 - ieeexplore.ieee.org
In this work, we extend our previously proposed offline SpatialNet for long-term streaming
multichannel speech enhancement in both static and moving speaker scenarios. SpatialNet …