An overview of deep-learning-based audio-visual speech enhancement and separation
Speech enhancement and speech separation are two related tasks, whose purpose is to
extract either one or more target speech signals, respectively, from a mixture of sounds …
extract either one or more target speech signals, respectively, from a mixture of sounds …
Survey of deep learning paradigms for speech processing
KB Bhangale, M Kothandaraman - Wireless Personal Communications, 2022 - Springer
Over the past decades, a particular focus is given to research on machine learning
techniques for speech processing applications. However, in the past few years, research …
techniques for speech processing applications. However, in the past few years, research …
Dual-path rnn: efficient long sequence modeling for time-domain single-channel speech separation
Recent studies in deep learning-based speech separation have proven the superiority of
time-domain approaches to conventional time-frequency-based methods. Unlike the time …
time-domain approaches to conventional time-frequency-based methods. Unlike the time …
Dual-path transformer network: Direct context-aware modeling for end-to-end monaural speech separation
The dominant speech separation models are based on complex recurrent or convolution
neural network that model speech sequences indirectly conditioning on context, such as …
neural network that model speech sequences indirectly conditioning on context, such as …
SDR–half-baked or well done?
In speech enhancement and source separation, signal-to-noise ratio is a ubiquitous
objective measure of denoising/separation quality. A decade ago, the BSS_eval toolkit was …
objective measure of denoising/separation quality. A decade ago, the BSS_eval toolkit was …
Conv-tasnet: Surpassing ideal time–frequency magnitude masking for speech separation
Single-channel, speaker-independent speech separation methods have recently seen great
progress. However, the accuracy, latency, and computational cost of such methods remain …
progress. However, the accuracy, latency, and computational cost of such methods remain …
Tasnet: time-domain audio separation network for real-time, single-channel speech separation
Robust speech processing in multi-talker environments requires effective speech
separation. Recent deep learning systems have made significant progress toward solving …
separation. Recent deep learning systems have made significant progress toward solving …
Wavesplit: End-to-end speech separation by speaker clustering
We introduce Wavesplit, an end-to-end source separation system. From a single mixture, the
model infers a representation for each source and then estimates each source signal given …
model infers a representation for each source and then estimates each source signal given …
Ablation studies in artificial neural networks
Ablation studies have been widely used in the field of neuroscience to tackle complex
biological systems such as the extensively studied Drosophila central nervous system, the …
biological systems such as the extensively studied Drosophila central nervous system, the …
Voice separation with an unknown number of multiple speakers
We present a new method for separating a mixed audio sequence, in which multiple voices
speak simultaneously. The new method employs gated neural networks that are trained to …
speak simultaneously. The new method employs gated neural networks that are trained to …