Dual-path transformer network: Direct context-aware modeling for end-to-end monaural speech separation
The dominant speech separation models are based on complex recurrent or convolution
neural network that model speech sequences indirectly conditioning on context, such as …
neural network that model speech sequences indirectly conditioning on context, such as …
Conv-tasnet: Surpassing ideal time–frequency magnitude masking for speech separation
Single-channel, speaker-independent speech separation methods have recently seen great
progress. However, the accuracy, latency, and computational cost of such methods remain …
progress. However, the accuracy, latency, and computational cost of such methods remain …
Wavesplit: End-to-end speech separation by speaker clustering
We introduce Wavesplit, an end-to-end source separation system. From a single mixture, the
model infers a representation for each source and then estimates each source signal given …
model infers a representation for each source and then estimates each source signal given …
Voice separation with an unknown number of multiple speakers
We present a new method for separating a mixed audio sequence, in which multiple voices
speak simultaneously. The new method employs gated neural networks that are trained to …
speak simultaneously. The new method employs gated neural networks that are trained to …
Spex: Multi-scale time domain speaker extraction network
Speaker extraction aims to mimic humans' selective auditory attention by extracting a target
speaker's voice from a multi-talker environment. It is common to perform the extraction in …
speaker's voice from a multi-talker environment. It is common to perform the extraction in …
Spex+: A complete time domain speaker extraction network
Speaker extraction aims to extract the target speech signal from a multi-talker environment
given a target speaker's reference speech. We recently proposed a time-domain solution …
given a target speaker's reference speech. We recently proposed a time-domain solution …
Divide and conquer: A deep CASA approach to talker-independent monaural speaker separation
We address talker-independent monaural speaker separation from the perspectives of deep
learning and computational auditory scene analysis (CASA). Specifically, we decompose …
learning and computational auditory scene analysis (CASA). Specifically, we decompose …
[PDF][PDF] Real-time single-channel dereverberation and separation with time-domain audio separation network.
We investigate the recently proposed Time-domain Audio Separation Network (TasNet) in
the task of real-time singlechannel speech dereverberation. Unlike systems that take …
the task of real-time singlechannel speech dereverberation. Unlike systems that take …
[PDF][PDF] Neural Speaker Extraction with Speaker-Speech Cross-Attention Network.
In this paper, we propose a novel time-domain speaker-speech cross-attention network as a
variant of SpEx [1] architecture, that features speaker-speech cross-attention. The …
variant of SpEx [1] architecture, that features speaker-speech cross-attention. The …
[PDF][PDF] Deep Attention Gated Dilated Temporal Convolutional Networks with Intra-Parallel Convolutional Modules for End-to-End Monaural Speech Separation.
Monaural speech separation techniques are far from satisfactory and are a challenging task
due to interference from multiple sources. Recently the deep dilated temporal convolutional …
due to interference from multiple sources. Recently the deep dilated temporal convolutional …