Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
USEV: Universal speaker extraction with visual cue
A speaker extraction algorithm seeks to extract the target speaker's speech from a multi-
talker speech mixture. The prior studies focus mostly on speaker extraction from a highly …
talker speech mixture. The prior studies focus mostly on speaker extraction from a highly …
NeuroHeed: Neuro-steered speaker extraction using EEG signals
Humans possess the remarkable ability to selectively attend to a single speaker amidst
competing voices and background noise, known as selective auditory attention. Recent …
competing voices and background noise, known as selective auditory attention. Recent …
A survey on deep multi-modal learning for body language recognition and generation
Body language (BL) refers to the non-verbal communication expressed through physical
movements, gestures, facial expressions, and postures. It is a form of communication that …
movements, gestures, facial expressions, and postures. It is a form of communication that …
Target active speaker detection with audio-visual cues
In active speaker detection (ASD), we would like to detect whether an on-screen person is
speaking based on audio-visual cues. Previous studies have primarily focused on modeling …
speaking based on audio-visual cues. Previous studies have primarily focused on modeling …
MSFNet: Multi-scale fusion network for brain-controlled speaker extraction
Speaker extraction aims to selectively extract the target speaker from the multi-talker
environment under the guidance of auxiliary reference. Recent studies have shown that the …
environment under the guidance of auxiliary reference. Recent studies have shown that the …
Time-domain speech separation networks with graph encoding auxiliary
End-to-end time-domain speech separation with masking strategy has shown its
performance advantage, where a 1-D convolutional layer is used as the speech encoder to …
performance advantage, where a 1-D convolutional layer is used as the speech encoder to …
NeuroHeed+: Improving neuro-steered speaker extraction with joint auditory attention detection
Neuro-steered speaker extraction aims to extract the listener's brainattended speech signal
from a multi-talker speech signal, in which the attention is derived from the cortical activity …
from a multi-talker speech signal, in which the attention is derived from the cortical activity …
Rethinking the visual cues in audio-visual speaker extraction
The Audio-Visual Speaker Extraction (AVSE) algorithm employs parallel video recording to
leverage two visual cues, namely speaker identity and synchronization, to enhance …
leverage two visual cues, namely speaker identity and synchronization, to enhance …
MMAL: Multi-Modal Analytic Learning for Exemplar-Free Audio-Visual Class Incremental Tasks
Class-incremental learning poses a significant challenge under an exemplar-free constraint,
leading to catastrophic forgetting and sub-par incremental accuracy. Previous attempts have …
leading to catastrophic forgetting and sub-par incremental accuracy. Previous attempts have …
Sparsity-driven EEG channel selection for brain-assisted speech enhancement
Speech enhancement is widely used as a front-end to improve the speech quality in many
audio systems, while it is hard to extract the target speech in multi-talker conditions without …
audio systems, while it is hard to extract the target speech in multi-talker conditions without …