Google Наука

Z Qin, P Zhao, T Zhuang, F Deng, Y Ding, D Chen - Information Fusion, 2023 - Elsevier

With the rapid development of the Mobile Internet and the Industrial Internet of Things, a
variety of applications put forward an urgent demand for user and device identity …

Запазване Позоваване С позовавания в 44 Сродни статии Всички 3 версии

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Target speech diarization with multimodal prompts

Y Jiang, R Tao, Z Chen, Y Qian, H Li - arxiv preprint arxiv:2406.07198, 2024 - arxiv.org

Traditional speaker diarization seeks to detect``who spoke when''according to speaker
characteristics. Extending to target speech diarization, we detect``when target event …

Запазване Позоваване С позовавания в 7 Сродни статии Всички 2 версии Във вид на HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Single-branch network for multimodal training

MS Saeed, S Nawaz, MH Khan… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

With the rapid growth of social media platforms, users are sharing billions of multimedia
posts containing audio, images, and text. Researchers have focused on building …

Запазване Позоваване С позовавания в 18 Сродни статии Всички 6 версии

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Fusion and orthogonal projection for improved face-voice association

MS Saeed, MH Khan, S Nawaz… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

We study the problem of learning association between face and voice. Prior works adopt
pairwise or triplet loss formulations to learn an embedding space amenable for associated …

Запазване Позоваване С позовавания в 32 Сродни статии Всички 6 версии

[免费ChatGPT] [DeepSeek可用网址] [PDF] thecvf.com

Cross-modal perceptionist: Can face geometry be gleaned from voices?

CY Wu, CC Hsu, U Neumann - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

This work digs into a root question in human perception: can face geometry be gleaned from
one's voices? Previous works that study this question only adopt developments in image …

Запазване Позоваване С позовавания в 20 Сродни статии Всички 5 версии Във вид на HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Unsupervised voice-face representation learning by cross-modal prototype contrast

B Zhu, K Xu, C Wang, Z Qin, T Sun, H Wang… - arxiv preprint arxiv …, 2022 - arxiv.org

We present an approach to learn voice-face representations from the talking face videos,
without any identity labels. Previous works employ cross-modal instance discrimination …

Запазване Позоваване С позовавания в 20 Сродни статии Всички 4 версии Във вид на HTML

Audio-visual speaker verification via joint cross-attention

GP Rajasekhar, J Alam - International Conference on Speech and …, 2023 - Springer

Speaker verification has been widely explored using speech signals, which has shown
significant improvement using deep models. Recently, there has been a surge in exploring …

Запазване Позоваване С позовавания в 9 Сродни статии Всички 4 версии

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Rethinking voice-face correlation: A geometry view

X Li, Y Wen, M Yang, J Wang, R Singh… - proceedings of the 31st …, 2023 - dl.acm.org

Previous works on voice-face matching and voice-guided face synthesis demonstrate strong
correlations between voice and face, but mainly rely on coarse semantic cues such as …

Запазване Позоваване С позовавания в 7 Сродни статии Всички 5 версии

VoiceStyle: Voice-based Face Generation Via Cross-modal Prototype Contrastive Learning

W Chen, B Zhu, K Xu, Y Dou, D Feng - ACM Transactions on Multimedia …, 2024 - dl.acm.org

Can we predict a person's appearance solely based on their voice? This article explores this
question by focusing on generating a face from an unheard voice segment. Our proposed …

Запазване Позоваване С позовавания в 3 Сродни статии

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Speaker recognition in realistic scenario using multimodal data

SH Shah, MS Saeed, S Nawaz… - 2023 3rd International …, 2023 - ieeexplore.ieee.org

In recent years, an association is established between faces and voices of celebrities
leveraging large scale audio-visual information from YouTube. The availability of large scale …

Запазване Позоваване С позовавания в 13 Сродни статии Всички 3 версии

Създаване на сигнал

Позоваване

Разширено търсене

Запазено в „Моята библиотека“

Cross-modal speaker verification and recognition: A multilingual perspective

[HTML][HTML] A survey of identity recognition via data fusion and feature learning

Target speech diarization with multimodal prompts

Single-branch network for multimodal training

Fusion and orthogonal projection for improved face-voice association

Cross-modal perceptionist: Can face geometry be gleaned from voices?

Unsupervised voice-face representation learning by cross-modal prototype contrast

Audio-visual speaker verification via joint cross-attention

Rethinking voice-face correlation: A geometry view

VoiceStyle: Voice-based Face Generation Via Cross-modal Prototype Contrastive Learning

Speaker recognition in realistic scenario using multimodal data