Google Acadèmic

Is someone speaking? exploring long-term temporal features for audio-visual active speaker detection

R Tao, Z Pan, RK Das, X Qian, MZ Shou… - Proceedings of the 29th …, 2021 - dl.acm.org

Active speaker detection (ASD) seeks to detect who is speaking in a visual scene of one or
more speakers. The successful ASD depends on accurate interpretation of short-term and …

Desa Cita Citat per 191 Articles relacionats Totes les 5 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Audio-visual cross-attention network for robotic speaker tracking

X Qian, Z Wang, J Wang, G Guan… - IEEE/ACM Transactions …, 2022 - ieeexplore.ieee.org

Audio-visual signals can be used jointly for robotic perception as they complement each
other. Such multi-modal sensory fusion has a clear advantage, especially under noisy …

Desa Cita Citat per 36 Articles relacionats Totes les 2 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Restoring speaking lips from occlusion for audio-visual speech recognition

J Wang, Z Pan, M Zhang, RT Tan, H Li - Proceedings of the AAAI …, 2024 - ojs.aaai.org

Prior studies on audio-visual speech recognition typically assume the visibility of speaking
lips, ignoring the fact that visual occlusion occurs in real-world videos, thus adversely …

Desa Cita Citat per 9 Articles relacionats Totes les 2 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

USEV: Universal speaker extraction with visual cue

Z Pan, M Ge, H Li - IEEE/ACM Transactions on Audio, Speech …, 2022 - ieeexplore.ieee.org

A speaker extraction algorithm seeks to extract the target speaker's speech from a multi-
talker speech mixture. The prior studies focus mostly on speaker extraction from a highly …

Desa Cita Citat per 49 Articles relacionats Totes les 4 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

NeuroHeed: Neuro-steered speaker extraction using EEG signals

Z Pan, M Borsdorf, S Cai, T Schultz… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org

Humans possess the remarkable ability to selectively attend to a single speaker amidst
competing voices and background noise, known as selective auditory attention. Recent …

Desa Cita Citat per 16 Articles relacionats Totes les 5 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Selective listening by synchronizing speech with lips

Z Pan, R Tao, C Xu, H Li - IEEE/ACM Transactions on Audio …, 2022 - ieeexplore.ieee.org

A speaker extraction algorithm seeks to extract the speech of a target speaker from a multi-
talker speech mixture when given a cue that represents the target speaker, such as a pre …

Desa Cita Citat per 48 Articles relacionats Totes les 4 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Av-sepformer: Cross-attention sepformer for audio-visual target speaker extraction

J Lin, X Cai, H Dinkel, J Chen, Z Yan… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Visual information can serve as an effective cue for target speaker extraction (TSE) and is
vital to improving extraction performance. In this paper, we propose AV-SepFormer, a …

Desa Cita Citat per 21 Articles relacionats Totes les 3 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Target speech diarization with multimodal prompts

Y Jiang, R Tao, Z Chen, Y Qian, H Li - arxiv preprint arxiv:2406.07198, 2024 - arxiv.org

Traditional speaker diarization seeks to detect``who spoke when''according to speaker
characteristics. Extending to target speech diarization, we detect``when target event …

Desa Cita Citat per 7 Articles relacionats Totes les 2 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Speaker extraction with co-speech gestures cue

Z Pan, X Qian, H Li - IEEE Signal Processing Letters, 2022 - ieeexplore.ieee.org

Speaker extraction seeks to extract the clean speech of a target speaker from a multi-talker
mixture speech. There have been studies to use a pre-recorded speech sample or face …

Desa Cita Citat per 28 Articles relacionats Totes les 3 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Rethinking the visual cues in audio-visual speaker extraction

J Li, M Ge, R Cao, L Wang, J Dang, S Zhang - arxiv preprint arxiv …, 2023 - arxiv.org

The Audio-Visual Speaker Extraction (AVSE) algorithm employs parallel video recording to
leverage two visual cues, namely speaker identity and synchronization, to enhance …

Desa Cita Citat per 12 Articles relacionats Totes les 4 versions Free GPT-4 DeepSeek Versió HTML

Crea una alerta

Cita

Cerca avançada

S'ha desat a La meva biblioteca

Muse: Multi-modal target speaker extraction with visual cues

Is someone speaking? exploring long-term temporal features for audio-visual active speaker detection

Audio-visual cross-attention network for robotic speaker tracking

Restoring speaking lips from occlusion for audio-visual speech recognition

USEV: Universal speaker extraction with visual cue

NeuroHeed: Neuro-steered speaker extraction using EEG signals

Selective listening by synchronizing speech with lips

Av-sepformer: Cross-attention sepformer for audio-visual target speaker extraction

Target speech diarization with multimodal prompts

Speaker extraction with co-speech gestures cue

Rethinking the visual cues in audio-visual speaker extraction