Dual-path transformer network: Direct context-aware modeling for end-to-end monaural speech separation

J Chen, Q Mao, D Liu - arxiv preprint arxiv:2007.13975, 2020 - arxiv.org
The dominant speech separation models are based on complex recurrent or convolution
neural network that model speech sequences indirectly conditioning on context, such as …

Conv-tasnet: Surpassing ideal time–frequency magnitude masking for speech separation

Y Luo, N Mesgarani - IEEE/ACM transactions on audio, speech …, 2019 - ieeexplore.ieee.org
Single-channel, speaker-independent speech separation methods have recently seen great
progress. However, the accuracy, latency, and computational cost of such methods remain …

Wavesplit: End-to-end speech separation by speaker clustering

N Zeghidour, D Grangier - IEEE/ACM Transactions on Audio …, 2021 - ieeexplore.ieee.org
We introduce Wavesplit, an end-to-end source separation system. From a single mixture, the
model infers a representation for each source and then estimates each source signal given …

Voice separation with an unknown number of multiple speakers

E Nachmani, Y Adi, L Wolf - International Conference on …, 2020 - proceedings.mlr.press
We present a new method for separating a mixed audio sequence, in which multiple voices
speak simultaneously. The new method employs gated neural networks that are trained to …

Spex: Multi-scale time domain speaker extraction network

C Xu, W Rao, ES Chng, H Li - IEEE/ACM transactions on audio …, 2020 - ieeexplore.ieee.org
Speaker extraction aims to mimic humans' selective auditory attention by extracting a target
speaker's voice from a multi-talker environment. It is common to perform the extraction in …

Spex+: A complete time domain speaker extraction network

M Ge, C Xu, L Wang, ES Chng, J Dang, H Li - arxiv preprint arxiv …, 2020 - arxiv.org
Speaker extraction aims to extract the target speech signal from a multi-talker environment
given a target speaker's reference speech. We recently proposed a time-domain solution …

Divide and conquer: A deep CASA approach to talker-independent monaural speaker separation

Y Liu, DL Wang - IEEE/ACM Transactions on audio, speech …, 2019 - ieeexplore.ieee.org
We address talker-independent monaural speaker separation from the perspectives of deep
learning and computational auditory scene analysis (CASA). Specifically, we decompose …

[PDF][PDF] Real-time single-channel dereverberation and separation with time-domain audio separation network.

Y Luo, N Mesgarani - Interspeech, 2018 - isca-archive.org
We investigate the recently proposed Time-domain Audio Separation Network (TasNet) in
the task of real-time singlechannel speech dereverberation. Unlike systems that take …

[PDF][PDF] Neural Speaker Extraction with Speaker-Speech Cross-Attention Network.

W Wang, C Xu, M Ge, H Li - Interspeech, 2021 - isca-archive.org
In this paper, we propose a novel time-domain speaker-speech cross-attention network as a
variant of SpEx [1] architecture, that features speaker-speech cross-attention. The …

[PDF][PDF] Deep Attention Gated Dilated Temporal Convolutional Networks with Intra-Parallel Convolutional Modules for End-to-End Monaural Speech Separation.

Z Shi, H Lin, L Liu, R Liu, J Han, A Shi - Interspeech, 2019 - isca-archive.org
Monaural speech separation techniques are far from satisfactory and are a challenging task
due to interference from multiple sources. Recently the deep dilated temporal convolutional …