A review of deep learning techniques for speech processing
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …
learning. The use of multiple processing layers has enabled the creation of models capable …
A survey on audio diffusion models: Text to speech synthesis and enhancement in generative ai
Generative AI has demonstrated impressive performance in various fields, among which
speech synthesis is an interesting direction. With the diffusion model as the most popular …
speech synthesis is an interesting direction. With the diffusion model as the most popular …
Diffusion Models for Audio Restoration: A review [Special Issue On Model-Based and Data-Driven Audio Signal Processing]
With the development of audio playback devices and fast data transmission, the demand for
high sound quality is rising for both entertainment and communications. In this quest for …
high sound quality is rising for both entertainment and communications. In this quest for …
Multi-source diffusion models for simultaneous music generation and separation
In this work, we define a diffusion-based generative model capable of both music synthesis
and source separation by learning the score of the joint probability density of sources …
and source separation by learning the score of the joint probability density of sources …
Generative pre-training for speech with flow matching
Generative models have gained more and more attention in recent years for their
remarkable success in tasks that required estimating and sampling data distribution to …
remarkable success in tasks that required estimating and sampling data distribution to …
Separate and diffuse: Using a pretrained diffusion model for improving source separation
The problem of speech separation, also known as the cocktail party problem, refers to the
task of isolating a single speech signal from a mixture of speech signals. Previous work on …
task of isolating a single speech signal from a mixture of speech signals. Previous work on …
Target speech extraction with conditional diffusion model
Diffusion model-based speech enhancement has received increased attention since it can
generate very natural enhanced signals and generalizes well to unseen conditions …
generate very natural enhanced signals and generalizes well to unseen conditions …
[HTML][HTML] Robust time series denoising with learnable wavelet packet transform
Noise in the data is one of the main cause of model performance drop. Denoising is
therefore a critical step in most data pipelines. In this paper we propose to fuse the learning …
therefore a critical step in most data pipelines. In this paper we propose to fuse the learning …
Schr\" odinger Bridge for Generative Speech Enhancement
This paper proposes a generative speech enhancement model based on Schr\" odinger
bridge (SB). The proposed model is employing a tractable SB to formulate a data-to-data …
bridge (SB). The proposed model is employing a tractable SB to formulate a data-to-data …
Seeing through the conversation: Audio-visual speech separation based on diffusion model
The objective of this work is to extract the target speaker's voice from a mixture of voices
using visual cues. Existing works on audio-visual speech separation have demonstrated …
using visual cues. Existing works on audio-visual speech separation have demonstrated …