A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023‏ - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

A survey on audio diffusion models: Text to speech synthesis and enhancement in generative ai

C Zhang, C Zhang, S Zheng, M Zhang… - arxiv preprint arxiv …, 2023‏ - arxiv.org
Generative AI has demonstrated impressive performance in various fields, among which
speech synthesis is an interesting direction. With the diffusion model as the most popular …

Diffusion Models for Audio Restoration: A review [Special Issue On Model-Based and Data-Driven Audio Signal Processing]

JM Lemercier, J Richter, S Welker… - IEEE Signal …, 2025‏ - ieeexplore.ieee.org
With the development of audio playback devices and fast data transmission, the demand for
high sound quality is rising for both entertainment and communications. In this quest for …

Multi-source diffusion models for simultaneous music generation and separation

G Mariani, I Tallini, E Postolache, M Mancusi… - arxiv preprint arxiv …, 2023‏ - arxiv.org
In this work, we define a diffusion-based generative model capable of both music synthesis
and source separation by learning the score of the joint probability density of sources …

Generative pre-training for speech with flow matching

AH Liu, M Le, A Vyas, B Shi, A Tjandra… - arxiv preprint arxiv …, 2023‏ - arxiv.org
Generative models have gained more and more attention in recent years for their
remarkable success in tasks that required estimating and sampling data distribution to …

Separate and diffuse: Using a pretrained diffusion model for improving source separation

S Lutati, E Nachmani, L Wolf - arxiv preprint arxiv:2301.10752, 2023‏ - arxiv.org
The problem of speech separation, also known as the cocktail party problem, refers to the
task of isolating a single speech signal from a mixture of speech signals. Previous work on …

Target speech extraction with conditional diffusion model

N Kamo, M Delcroix, T Nakatani - arxiv preprint arxiv:2308.03987, 2023‏ - arxiv.org
Diffusion model-based speech enhancement has received increased attention since it can
generate very natural enhanced signals and generalizes well to unseen conditions …

[HTML][HTML] Robust time series denoising with learnable wavelet packet transform

G Frusque, O Fink - Advanced Engineering Informatics, 2024‏ - Elsevier
Noise in the data is one of the main cause of model performance drop. Denoising is
therefore a critical step in most data pipelines. In this paper we propose to fuse the learning …

Schr\" odinger Bridge for Generative Speech Enhancement

A Jukić, R Korostik, J Balam, B Ginsburg - arxiv preprint arxiv:2407.16074, 2024‏ - arxiv.org
This paper proposes a generative speech enhancement model based on Schr\" odinger
bridge (SB). The proposed model is employing a tractable SB to formulate a data-to-data …

Seeing through the conversation: Audio-visual speech separation based on diffusion model

S Lee, C Jung, Y Jang, J Kim… - ICASSP 2024-2024 IEEE …, 2024‏ - ieeexplore.ieee.org
The objective of this work is to extract the target speaker's voice from a mixture of voices
using visual cues. Existing works on audio-visual speech separation have demonstrated …