USEV: Universal speaker extraction with visual cue

Z Pan, M Ge, H Li - IEEE/ACM Transactions on Audio, Speech …, 2022‏ - ieeexplore.ieee.org
A speaker extraction algorithm seeks to extract the target speaker's speech from a multi-
talker speech mixture. The prior studies focus mostly on speaker extraction from a highly …

NeuroHeed: Neuro-steered speaker extraction using EEG signals

Z Pan, M Borsdorf, S Cai, T Schultz… - IEEE/ACM Transactions …, 2024‏ - ieeexplore.ieee.org
Humans possess the remarkable ability to selectively attend to a single speaker amidst
competing voices and background noise, known as selective auditory attention. Recent …

A survey on deep multi-modal learning for body language recognition and generation

L Liu, L Gao, W Lei, F Ma, X Lin, J Wang - arxiv preprint arxiv:2308.08849, 2023‏ - arxiv.org
Body language (BL) refers to the non-verbal communication expressed through physical
movements, gestures, facial expressions, and postures. It is a form of communication that …

Target active speaker detection with audio-visual cues

Y Jiang, R Tao, Z Pan, H Li - arxiv preprint arxiv:2305.12831, 2023‏ - arxiv.org
In active speaker detection (ASD), we would like to detect whether an on-screen person is
speaking based on audio-visual cues. Previous studies have primarily focused on modeling …

MSFNet: Multi-scale fusion network for brain-controlled speaker extraction

C Fan, J Zhang, H Zhang, W **ang, J Tao, X Li… - Proceedings of the …, 2024‏ - dl.acm.org
Speaker extraction aims to selectively extract the target speaker from the multi-talker
environment under the guidance of auxiliary reference. Recent studies have shown that the …

Time-domain speech separation networks with graph encoding auxiliary

T Wang, Z Pan, M Ge, Z Yang… - IEEE Signal Processing …, 2023‏ - ieeexplore.ieee.org
End-to-end time-domain speech separation with masking strategy has shown its
performance advantage, where a 1-D convolutional layer is used as the speech encoder to …

NeuroHeed+: Improving neuro-steered speaker extraction with joint auditory attention detection

Z Pan, G Wichern, FG Germain… - ICASSP 2024-2024 …, 2024‏ - ieeexplore.ieee.org
Neuro-steered speaker extraction aims to extract the listener's brainattended speech signal
from a multi-talker speech signal, in which the attention is derived from the cortical activity …

Rethinking the visual cues in audio-visual speaker extraction

J Li, M Ge, R Cao, L Wang, J Dang, S Zhang - arxiv preprint arxiv …, 2023‏ - arxiv.org
The Audio-Visual Speaker Extraction (AVSE) algorithm employs parallel video recording to
leverage two visual cues, namely speaker identity and synchronization, to enhance …

MMAL: Multi-Modal Analytic Learning for Exemplar-Free Audio-Visual Class Incremental Tasks

X Yue, X Zhang, Y Chen, C Zhang, M Lao… - Proceedings of the …, 2024‏ - dl.acm.org
Class-incremental learning poses a significant challenge under an exemplar-free constraint,
leading to catastrophic forgetting and sub-par incremental accuracy. Previous attempts have …

Sparsity-driven EEG channel selection for brain-assisted speech enhancement

J Zhang, QT Xu, ZH Ling, H Li - arxiv preprint arxiv:2311.13436, 2023‏ - arxiv.org
Speech enhancement is widely used as a front-end to improve the speech quality in many
audio systems, while it is hard to extract the target speech in multi-talker conditions without …