Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Is someone speaking? exploring long-term temporal features for audio-visual active speaker detection
Active speaker detection (ASD) seeks to detect who is speaking in a visual scene of one or
more speakers. The successful ASD depends on accurate interpretation of short-term and …
more speakers. The successful ASD depends on accurate interpretation of short-term and …
Audio-visual cross-attention network for robotic speaker tracking
Audio-visual signals can be used jointly for robotic perception as they complement each
other. Such multi-modal sensory fusion has a clear advantage, especially under noisy …
other. Such multi-modal sensory fusion has a clear advantage, especially under noisy …
Restoring speaking lips from occlusion for audio-visual speech recognition
Prior studies on audio-visual speech recognition typically assume the visibility of speaking
lips, ignoring the fact that visual occlusion occurs in real-world videos, thus adversely …
lips, ignoring the fact that visual occlusion occurs in real-world videos, thus adversely …
USEV: Universal speaker extraction with visual cue
A speaker extraction algorithm seeks to extract the target speaker's speech from a multi-
talker speech mixture. The prior studies focus mostly on speaker extraction from a highly …
talker speech mixture. The prior studies focus mostly on speaker extraction from a highly …
NeuroHeed: Neuro-steered speaker extraction using EEG signals
Humans possess the remarkable ability to selectively attend to a single speaker amidst
competing voices and background noise, known as selective auditory attention. Recent …
competing voices and background noise, known as selective auditory attention. Recent …
Selective listening by synchronizing speech with lips
A speaker extraction algorithm seeks to extract the speech of a target speaker from a multi-
talker speech mixture when given a cue that represents the target speaker, such as a pre …
talker speech mixture when given a cue that represents the target speaker, such as a pre …
Av-sepformer: Cross-attention sepformer for audio-visual target speaker extraction
Visual information can serve as an effective cue for target speaker extraction (TSE) and is
vital to improving extraction performance. In this paper, we propose AV-SepFormer, a …
vital to improving extraction performance. In this paper, we propose AV-SepFormer, a …
Target speech diarization with multimodal prompts
Traditional speaker diarization seeks to detect``who spoke when''according to speaker
characteristics. Extending to target speech diarization, we detect``when target event …
characteristics. Extending to target speech diarization, we detect``when target event …
Speaker extraction with co-speech gestures cue
Speaker extraction seeks to extract the clean speech of a target speaker from a multi-talker
mixture speech. There have been studies to use a pre-recorded speech sample or face …
mixture speech. There have been studies to use a pre-recorded speech sample or face …
Rethinking the visual cues in audio-visual speaker extraction
The Audio-Visual Speaker Extraction (AVSE) algorithm employs parallel video recording to
leverage two visual cues, namely speaker identity and synchronization, to enhance …
leverage two visual cues, namely speaker identity and synchronization, to enhance …