An overview of deep-learning-based audio-visual speech enhancement and separation

D Michelsanti, ZH Tan, SX Zhang, Y Xu… - … on Audio, Speech …, 2021 - ieeexplore.ieee.org
Speech enhancement and speech separation are two related tasks, whose purpose is to
extract either one or more target speech signals, respectively, from a mixture of sounds …

One-shot conditional audio filtering of arbitrary sounds

B Gfeller, D Roblek… - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
We consider the problem of separating a particular sound source from a single-channel
mixture, based on only a short sample of the target source (from the same recording). Using …

Hierarchic temporal convolutional network with cross-domain encoder for music source separation

Y Hu, Y Chen, W Yang, L He… - IEEE Signal Processing …, 2022 - ieeexplore.ieee.org
Recently, the time-domain-based methods (ie, the method of modeling the raw waveform
directly) for audio source separation have shown tremendous potential. In this paper, we …

Vovit: Low latency graph-based audio-visual voice separation transformer

JF Montesinos, VS Kadandale, G Haro - European Conference on …, 2022 - Springer
This paper presents an audio-visual approach for voice separation which produces state-of-
the-art results at a low latency in two scenarios: speech and singing voice. The model is …

Heterogeneous target speech separation

E Tzinis, G Wichern, A Subramanian… - arxiv preprint arxiv …, 2022 - arxiv.org
We introduce a new paradigm for single-channel target source separation where the
sources of interest can be distinguished using non-mutually exclusive concepts (eg …

[PDF][PDF] Hierarchical Musical Instrument Separation.

E Manilow, G Wichern, J Le Roux - ISMIR, 2020 - program.ismir2020.net
Many sounds that humans encounter are hierarchical in nature; a piano note is one of many
played during a performance, which is one of many instruments in a band, which might be …

Monaural speech separation using speaker embedding from preliminary separation

J Byun, JW Shin - IEEE/ACM Transactions on Audio, Speech …, 2021 - ieeexplore.ieee.org
In speech separation, the identities of the speakers may be an important cue to discriminate
speeches in the mixture and separate them better. A few recent researches used the …

A cappella: Audio-visual singing voice separation

JF Montesinos, VS Kadandale, G Haro - arxiv preprint arxiv:2104.09946, 2021 - arxiv.org
The task of isolating a target singing voice in music videos has useful applications. In this
work, we explore the single-channel singing voice separation problem from a multimodal …

Dpm-tse: A diffusion probabilistic model for target sound extraction

J Hai, H Wang, D Yang, K Thakkar… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Common target sound extraction (TSE) approaches primarily relied on discriminative
approaches in order to separate the target sound while minimizing interference from the …

The whole is greater than the sum of its parts: improving music source separation by bridging networks

R Sawata, N Takahashi, S Uhlich, S Takahashi… - EURASIP Journal on …, 2024 - Springer
This paper presents the crossing scheme (X-scheme) for improving the performance of deep
neural network (DNN)-based music source separation (MSS) with almost no increasing …