- Academic Search

Z Pan, M Ge, H Li - IEEE/ACM Transactions on Audio, Speech …, 2022‏ - ieeexplore.ieee.org‏

A speaker extraction algorithm seeks to extract the target speaker's speech from a multi-
talker speech mixture. The prior studies focus mostly on speaker extraction from a highly …‏

שמור צטט צוטט על ידי 49 מאמרים בנושא זה כל 4 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

NeuroHeed: Neuro-steered speaker extraction using EEG signals‏

Z Pan, M Borsdorf, S Cai, T Schultz… - IEEE/ACM Transactions …, 2024‏ - ieeexplore.ieee.org‏

Humans possess the remarkable ability to selectively attend to a single speaker amidst
competing voices and background noise, known as selective auditory attention. Recent …‏

שמור צטט צוטט על ידי 17 מאמרים בנושא זה כל 5 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey on deep multi-modal learning for body language recognition and generation‏

L Liu, L Gao, W Lei, F Ma, X Lin, J Wang - arxiv preprint arxiv:2308.08849, 2023‏ - arxiv.org‏

Body language (BL) refers to the non-verbal communication expressed through physical
movements, gestures, facial expressions, and postures. It is a form of communication that …‏

שמור צטט צוטט על ידי 5 מאמרים בנושא זה כל 2 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Target active speaker detection with audio-visual cues‏

Y Jiang, R Tao, Z Pan, H Li - arxiv preprint arxiv:2305.12831, 2023‏ - arxiv.org‏

In active speaker detection (ASD), we would like to detect whether an on-screen person is
speaking based on audio-visual cues. Previous studies have primarily focused on modeling …‏

שמור צטט צוטט על ידי 18 מאמרים בנושא זה כל 6 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

MSFNet: Multi-scale fusion network for brain-controlled speaker extraction‏

C Fan, J Zhang, H Zhang, W **ang, J Tao, X Li… - Proceedings of the …, 2024‏ - dl.acm.org‏

Speaker extraction aims to selectively extract the target speaker from the multi-talker
environment under the guidance of auxiliary reference. Recent studies have shown that the …‏

שמור צטט צוטט על ידי 5 מאמרים בנושא זה כל 2 הגרסאות

Time-domain speech separation networks with graph encoding auxiliary‏

T Wang, Z Pan, M Ge, Z Yang… - IEEE Signal Processing …, 2023‏ - ieeexplore.ieee.org‏

End-to-end time-domain speech separation with masking strategy has shown its
performance advantage, where a 1-D convolutional layer is used as the speech encoder to …‏

שמור צטט צוטט על ידי 15 מאמרים בנושא זה כל 2 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

NeuroHeed+: Improving neuro-steered speaker extraction with joint auditory attention detection‏

Z Pan, G Wichern, FG Germain… - ICASSP 2024-2024 …, 2024‏ - ieeexplore.ieee.org‏

Neuro-steered speaker extraction aims to extract the listener's brainattended speech signal
from a multi-talker speech signal, in which the attention is derived from the cortical activity …‏

שמור צטט צוטט על ידי 9 מאמרים בנושא זה כל 8 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Rethinking the visual cues in audio-visual speaker extraction‏

J Li, M Ge, R Cao, L Wang, J Dang, S Zhang - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

The Audio-Visual Speaker Extraction (AVSE) algorithm employs parallel video recording to
leverage two visual cues, namely speaker identity and synchronization, to enhance …‏

שמור צטט צוטט על ידי 12 מאמרים בנושא זה כל 4 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

MMAL: Multi-Modal Analytic Learning for Exemplar-Free Audio-Visual Class Incremental Tasks‏

X Yue, X Zhang, Y Chen, C Zhang, M Lao… - Proceedings of the …, 2024‏ - dl.acm.org‏

Class-incremental learning poses a significant challenge under an exemplar-free constraint,
leading to catastrophic forgetting and sub-par incremental accuracy. Previous attempts have …‏

שמור צטט צוטט על ידי 2 מאמרים בנושא זה כל 2 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Sparsity-driven EEG channel selection for brain-assisted speech enhancement‏

J Zhang, QT Xu, ZH Ling, H Li - arxiv preprint arxiv:2311.13436, 2023‏ - arxiv.org‏

Speech enhancement is widely used as a front-end to improve the speech quality in many
audio systems, while it is hard to extract the target speech in multi-talker conditions without …‏

שמור צטט צוטט על ידי 6 מאמרים בנושא זה כל 2 הגרסאות פתיחה בתור HTML

יצירת התראה

צטט

חיפוש מתקדם

נשמר בספרייה שלי

Speaker extraction with co-speech gestures cue

USEV: Universal speaker extraction with visual cue‏

NeuroHeed: Neuro-steered speaker extraction using EEG signals‏

A survey on deep multi-modal learning for body language recognition and generation‏

Target active speaker detection with audio-visual cues‏

MSFNet: Multi-scale fusion network for brain-controlled speaker extraction‏

Time-domain speech separation networks with graph encoding auxiliary‏

NeuroHeed+: Improving neuro-steered speaker extraction with joint auditory attention detection‏

Rethinking the visual cues in audio-visual speaker extraction‏

MMAL: Multi-Modal Analytic Learning for Exemplar-Free Audio-Visual Class Incremental Tasks‏

Sparsity-driven EEG channel selection for brain-assisted speech enhancement‏