Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Deep learning for visual speech analysis: A survey
Visual speech, referring to the visual domain of speech, has attracted increasing attention
due to its wide applications, such as public security, medical treatment, military defense, and …
due to its wide applications, such as public security, medical treatment, military defense, and …
Facefilter: Audio-visual speech separation using still images
The objective of this paper is to separate a target speaker's speech from a mixture of two
speakers using a deep audio-visual speech separation network. Unlike previous works that …
speakers using a deep audio-visual speech separation network. Unlike previous works that …
Imaginary voice: Face-styled diffusion model for text-to-speech
The goal of this work is zero-shot text-to-speech synthesis, with speaking styles and voices
learnt from facial characteristics. Inspired by the natural fact that people can imagine the …
learnt from facial characteristics. Inspired by the natural fact that people can imagine the …
Looking into your speech: Learning cross-modal affinity for audio-visual speech separation
In this paper, we address the problem of separating individual speech signals from videos
using audio-visual neural processing. Most conventional approaches utilize frame-wise …
using audio-visual neural processing. Most conventional approaches utilize frame-wise …
Lira: Learning visual speech representations from audio through self-supervision
The large amount of audiovisual content being shared online today has drawn substantial
attention to the prospect of audiovisual self-supervised learning. Recent works have focused …
attention to the prospect of audiovisual self-supervised learning. Recent works have focused …
Target speech diarization with multimodal prompts
Traditional speaker diarization seeks to detect``who spoke when''according to speaker
characteristics. Extending to target speech diarization, we detect``when target event …
characteristics. Extending to target speech diarization, we detect``when target event …
Vocalist: An audio-visual synchronisation model for lips and voices
In this paper, we address the problem of lip-voice synchronisation in videos containing
human face and voice. Our approach is based on determining if the lips motion and the …
human face and voice. Our approach is based on determining if the lips motion and the …
Seeing voices and hearing voices: learning discriminative embeddings using cross-modal self-supervision
The goal of this work is to train discriminative cross-modal embeddings without access to
manually annotated data. Recent advances in self-supervised learning have shown that …
manually annotated data. Recent advances in self-supervised learning have shown that …
Look who's talking: Active speaker detection in the wild
In this work, we present a novel audio-visual dataset for active speaker detection in the wild.
A speaker is considered active when his or her face is visible and the voice is audible …
A speaker is considered active when his or her face is visible and the voice is audible …
Improved lite audio-visual speech enhancement
Numerous studies have investigated the effectiveness of audio-visual multimodal learning
for speech enhancement (AVSE) tasks, seeking a solution that uses visual data as auxiliary …
for speech enhancement (AVSE) tasks, seeking a solution that uses visual data as auxiliary …