Mfa-conformer: Multi-scale feature aggregation conformer for automatic speaker verification

Y Zhang, Z Lv, H Wu, S Zhang, P Hu, Z Wu… - ar**
Y Koizumi, H Zen, K Yatabe, N Chen… - arxiv preprint arxiv …, 2022 - arxiv.org
Neural vocoder using denoising diffusion probabilistic model (DDPM) has been improved by
adaptation of the diffusion noise distribution to given acoustic features. In this study, we …

Miipher: A robust speech restoration model integrating self-supervised speech and text representations

Y Koizumi, H Zen, S Karita, Y Ding… - … IEEE Workshop on …, 2023 - ieeexplore.ieee.org
Speech restoration (SR) is a task of converting degraded speech signals into high-quality
ones. In this study, we propose a robust SR model called Miipher, and apply Miipher to a …

NSE-CATNet: deep neural speech enhancement using convolutional attention transformer network

N Saleem, TS Gunawan, M Kartiwi, BS Nugroho… - IEEE …, 2023 - ieeexplore.ieee.org
Speech enhancement (SE) is a critical aspect of various speech-processing applications.
Recent research in this field focuses on identifying effective ways to capture the long-term …

DeFT-AN: Dense frequency-time attentive network for multichannel speech enhancement

D Lee, JW Choi - IEEE Signal Processing Letters, 2023 - ieeexplore.ieee.org
In this study, we propose a dense frequency-time attentive network (DeFT-AN) for
multichannel speech enhancement. DeFT-AN is a mask estimation network that predicts a …

Exploring self-attention mechanisms for speech separation

C Subakan, M Ravanelli, S Cornell… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
Transformers have enabled impressive improvements in deep learning. They often
outperform recurrent and convolutional models in many tasks while taking advantage of …