[HTML][HTML] A survey of identity recognition via data fusion and feature learning
With the rapid development of the Mobile Internet and the Industrial Internet of Things, a
variety of applications put forward an urgent demand for user and device identity …
variety of applications put forward an urgent demand for user and device identity …
Target speech diarization with multimodal prompts
Traditional speaker diarization seeks to detect``who spoke when''according to speaker
characteristics. Extending to target speech diarization, we detect``when target event …
characteristics. Extending to target speech diarization, we detect``when target event …
Single-branch network for multimodal training
With the rapid growth of social media platforms, users are sharing billions of multimedia
posts containing audio, images, and text. Researchers have focused on building …
posts containing audio, images, and text. Researchers have focused on building …
Fusion and orthogonal projection for improved face-voice association
We study the problem of learning association between face and voice. Prior works adopt
pairwise or triplet loss formulations to learn an embedding space amenable for associated …
pairwise or triplet loss formulations to learn an embedding space amenable for associated …
Cross-modal perceptionist: Can face geometry be gleaned from voices?
This work digs into a root question in human perception: can face geometry be gleaned from
one's voices? Previous works that study this question only adopt developments in image …
one's voices? Previous works that study this question only adopt developments in image …
Unsupervised voice-face representation learning by cross-modal prototype contrast
We present an approach to learn voice-face representations from the talking face videos,
without any identity labels. Previous works employ cross-modal instance discrimination …
without any identity labels. Previous works employ cross-modal instance discrimination …
Audio-visual speaker verification via joint cross-attention
GP Rajasekhar, J Alam - International Conference on Speech and …, 2023 - Springer
Speaker verification has been widely explored using speech signals, which has shown
significant improvement using deep models. Recently, there has been a surge in exploring …
significant improvement using deep models. Recently, there has been a surge in exploring …
Rethinking voice-face correlation: A geometry view
Previous works on voice-face matching and voice-guided face synthesis demonstrate strong
correlations between voice and face, but mainly rely on coarse semantic cues such as …
correlations between voice and face, but mainly rely on coarse semantic cues such as …
VoiceStyle: Voice-based Face Generation Via Cross-modal Prototype Contrastive Learning
W Chen, B Zhu, K Xu, Y Dou, D Feng - ACM Transactions on Multimedia …, 2024 - dl.acm.org
Can we predict a person's appearance solely based on their voice? This article explores this
question by focusing on generating a face from an unheard voice segment. Our proposed …
question by focusing on generating a face from an unheard voice segment. Our proposed …
Speaker recognition in realistic scenario using multimodal data
In recent years, an association is established between faces and voices of celebrities
leveraging large scale audio-visual information from YouTube. The availability of large scale …
leveraging large scale audio-visual information from YouTube. The availability of large scale …