An overview of deep-learning-based audio-visual speech enhancement and separation

D Michelsanti, ZH Tan, SX Zhang, Y Xu… - … on Audio, Speech …, 2021 - ieeexplore.ieee.org
Speech enhancement and speech separation are two related tasks, whose purpose is to
extract either one or more target speech signals, respectively, from a mixture of sounds …

Embodied AI‐driven operation of smart cities: a concise review

F Shenavarmasouleh, FG Mohammadi… - Cyberphysical Smart …, 2022 - Wiley Online Library
An undeniable part of a smart city is its use of smart agents. These agents can vary a lot in
sizes, shapes, and functionalities. Embodied artificial intelligence is the field of study that …

Conditioned source separation for musical instrument performances

O Slizovskaia, G Haro, E Gómez - IEEE/ACM Transactions on …, 2021 - ieeexplore.ieee.org
In music source separation, the number of sources may vary for each piece and some of the
sources may belong to the same family of instruments, thus sharing timbral characteristics …

End-to-end sound source separation conditioned on instrument labels

O Slizovskaia, L Kim, G Haro… - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org
Can we perform an end-to-end music source separation with a variable number of sources
using a deep learning model? This paper presents an extension of the Wave-U-Net [1] …

Audiovisual analysis of music performances: Overview of an emerging field

Z Duan, S Essid, CCS Liem, G Richard… - IEEE Signal …, 2018 - ieeexplore.ieee.org
In the physical sciences and engineering domains, music has traditionally been considered
an acoustic phenomenon. From a perceptual viewpoint, music is naturally associated with …

Solos: A dataset for audio-visual music analysis

JF Montesinos, O Slizovskaia… - 2020 IEEE 22nd …, 2020 - ieeexplore.ieee.org
In this paper, we present a new dataset of music performance videos which can be used for
training machine learning methods for multiple tasks such as audio-visual blind source …

Neuro-steered music source separation with EEG-based auditory attention decoding and contrastive-NMF

G Cantisani, S Essid, G Richard - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
We propose a novel informed music source separation paradigm, which can be referred to
as neuro-steered music source separation. More precisely, the source separation process is …

TACO: Training-free Sound Prompted Segmentation via Deep Audio-visual CO-factorization

H Malard, M Olvera, S Lathuiliere, S Essid - arxiv preprint arxiv …, 2024 - arxiv.org
Large-scale pre-trained audio and image models demonstrate an unprecedented degree of
generalization, making them suitable for a wide range of applications. Here, we tackle the …

User-guided one-shot deep model adaptation for music source separation

G Cantisani, A Ozerov, S Essid… - 2021 IEEE Workshop on …, 2021 - ieeexplore.ieee.org
Music source separation is the task of isolating individual instruments which are mixed in a
musical piece. This task is particularly challenging, and even state-of-the-art models can …

[PDF][PDF] Online audio-visual source association for chamber music performances

B Li, K Dinesh, C Xu, G Sharma, Z Duan - Transactions of the …, 2019 - par.nsf.gov
In audio-visual recordings of music performances, visual cues from instrument players
exhibit good temporal correspondence with the audio signals and the music content. These …