Neural target speech extraction: An overview

K Zmolikova, M Delcroix, T Ochiai… - IEEE Signal …, 2023 - ieeexplore.ieee.org
Humans can listen to a target speaker even in challenging acoustic conditions that have
noise, reverberation, and interfering speakers. This phenomenon is known as the cocktail …

Wavesplit: End-to-end speech separation by speaker clustering

N Zeghidour, D Grangier - IEEE/ACM Transactions on Audio …, 2021 - ieeexplore.ieee.org
We introduce Wavesplit, an end-to-end source separation system. From a single mixture, the
model infers a representation for each source and then estimates each source signal given …

Weakly-supervised audio-visual segmentation

S Mo, B Raj - Advances in Neural Information Processing …, 2024 - proceedings.neurips.cc
Audio-visual segmentation is a challenging task that aims to predict pixel-level masks for
sound sources in a video. Previous work applied a comprehensive manually designed …

Improving universal sound separation using sound classification

E Tzinis, S Wisdom, JR Hershey… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
Deep learning approaches have recently achieved impressive performance on both audio
source separation and sound classification. Most audio source separation approaches focus …

Meta-learning extractors for music source separation

D Samuel, A Ganeshan… - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org
We propose a hierarchical meta-learning-inspired model for music source separation (Meta-
TasNet) in which a generator model is used to predict the weights of individual extractor …

Move2hear: Active audio-visual source separation

S Majumder, Z Al-Halah… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
We introduce the active audio-visual source separation problem, where an agent must move
intelligently in order to better isolate the sounds coming from an object of interest in its …

SoundBeam: Target sound extraction conditioned on sound-class labels and enrollment clues for increased performance and continuous learning

M Delcroix, JB Vázquez, T Ochiai… - … on Audio, Speech …, 2022 - ieeexplore.ieee.org
In many situations, we would like to hear desired sound events (SEs) while being able to
ignore interference. Target sound extraction (TSE) tackles this problem by estimating the …

A unified model for zero-shot music source separation, transcription and synthesis

L Lin, Q Kong, J Jiang, G **a - arxiv preprint arxiv:2108.03456, 2021 - arxiv.org
We propose a unified model for three inter-related tasks: 1) to\textit {separate} individual
sound sources from a mixed music audio, 2) to\textit {transcribe} each sound source to MIDI …

Universal source separation with weakly labelled data

Q Kong, K Chen, H Liu, X Du, T Berg-Kirkpatrick… - arxiv preprint arxiv …, 2023 - arxiv.org
Universal source separation (USS) is a fundamental research task for computational
auditory scene analysis, which aims to separate mono recordings into individual source …

Heterogeneous target speech separation

E Tzinis, G Wichern, A Subramanian… - arxiv preprint arxiv …, 2022 - arxiv.org
We introduce a new paradigm for single-channel target source separation where the
sources of interest can be distinguished using non-mutually exclusive concepts (eg …