- Academic Search

C Sheng, G Kuang, L Bai, C Hou, Y Guo… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Visual speech, referring to the visual domain of speech, has attracted increasing attention
due to its wide applications, such as public security, medical treatment, military defense, and …

Spara Citera Citerat av 46 Relaterade artiklar Alla 11 versionerna

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Facefilter: Audio-visual speech separation using still images

SW Chung, S Choe, JS Chung, HG Kang - arxiv preprint arxiv …, 2020 - arxiv.org

The objective of this paper is to separate a target speaker's speech from a mixture of two
speakers using a deep audio-visual speech separation network. Unlike previous works that …

Spara Citera Citerat av 79 Relaterade artiklar Alla 9 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Imaginary voice: Face-styled diffusion model for text-to-speech

J Lee, JS Chung, SW Chung - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org

The goal of this work is zero-shot text-to-speech synthesis, with speaking styles and voices
learnt from facial characteristics. Inspired by the natural fact that people can imagine the …

Spara Citera Citerat av 27 Relaterade artiklar Alla 7 versionerna

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Looking into your speech: Learning cross-modal affinity for audio-visual speech separation

J Lee, SW Chung, S Kim, HG Kang… - Proceedings of the …, 2021 - openaccess.thecvf.com

In this paper, we address the problem of separating individual speech signals from videos
using audio-visual neural processing. Most conventional approaches utilize frame-wise …

Spara Citera Citerat av 54 Relaterade artiklar Alla 7 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Lira: Learning visual speech representations from audio through self-supervision

P Ma, R Mira, S Petridis, BW Schuller… - arxiv preprint arxiv …, 2021 - arxiv.org

The large amount of audiovisual content being shared online today has drawn substantial
attention to the prospect of audiovisual self-supervised learning. Recent works have focused …

Spara Citera Citerat av 52 Relaterade artiklar Alla 6 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Target speech diarization with multimodal prompts

Y Jiang, R Tao, Z Chen, Y Qian, H Li - arxiv preprint arxiv:2406.07198, 2024 - arxiv.org

Traditional speaker diarization seeks to detect``who spoke when''according to speaker
characteristics. Extending to target speech diarization, we detect``when target event …

Spara Citera Citerat av 7 Relaterade artiklar Alla 2 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Vocalist: An audio-visual synchronisation model for lips and voices

VS Kadandale, JF Montesinos, G Haro - arxiv preprint arxiv:2204.02090, 2022 - arxiv.org

In this paper, we address the problem of lip-voice synchronisation in videos containing
human face and voice. Our approach is based on determining if the lips motion and the …

Spara Citera Citerat av 27 Relaterade artiklar Alla 9 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Seeing voices and hearing voices: learning discriminative embeddings using cross-modal self-supervision

SW Chung, HG Kang, JS Chung - arxiv preprint arxiv:2004.14326, 2020 - arxiv.org

The goal of this work is to train discriminative cross-modal embeddings without access to
manually annotated data. Recent advances in self-supervised learning have shown that …

Spara Citera Citerat av 47 Relaterade artiklar Alla 8 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Look who's talking: Active speaker detection in the wild

YJ Kim, HS Heo, S Choe, SW Chung, Y Kwon… - arxiv preprint arxiv …, 2021 - arxiv.org

In this work, we present a novel audio-visual dataset for active speaker detection in the wild.
A speaker is considered active when his or her face is visible and the voice is audible …

Spara Citera Citerat av 30 Relaterade artiklar Alla 7 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Improved lite audio-visual speech enhancement

SY Chuang, HM Wang, Y Tsao - IEEE/ACM Transactions on …, 2022 - ieeexplore.ieee.org

Numerous studies have investigated the effectiveness of audio-visual multimodal learning
for speech enhancement (AVSE) tasks, seeking a solution that uses visual data as auxiliary …

Spara Citera Citerat av 40 Relaterade artiklar Alla 7 versionerna

Skapa alarm

Citera

Avancerad sökning

Har sparats i Mitt bibliotek

Perfect match: Self-supervised embeddings for cross-modal retrieval

Deep learning for visual speech analysis: A survey

Facefilter: Audio-visual speech separation using still images

Imaginary voice: Face-styled diffusion model for text-to-speech

Looking into your speech: Learning cross-modal affinity for audio-visual speech separation

Lira: Learning visual speech representations from audio through self-supervision

Target speech diarization with multimodal prompts

Vocalist: An audio-visual synchronisation model for lips and voices

Seeing voices and hearing voices: learning discriminative embeddings using cross-modal self-supervision

Look who's talking: Active speaker detection in the wild

Improved lite audio-visual speech enhancement