[HTML][HTML] A survey of identity recognition via data fusion and feature learning

Z Qin, P Zhao, T Zhuang, F Deng, Y Ding, D Chen - Information Fusion, 2023 - Elsevier
With the rapid development of the Mobile Internet and the Industrial Internet of Things, a
variety of applications put forward an urgent demand for user and device identity …

Target speech diarization with multimodal prompts

Y Jiang, R Tao, Z Chen, Y Qian, H Li - arxiv preprint arxiv:2406.07198, 2024 - arxiv.org
Traditional speaker diarization seeks to detect``who spoke when''according to speaker
characteristics. Extending to target speech diarization, we detect``when target event …

Single-branch network for multimodal training

MS Saeed, S Nawaz, MH Khan… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
With the rapid growth of social media platforms, users are sharing billions of multimedia
posts containing audio, images, and text. Researchers have focused on building …

Fusion and orthogonal projection for improved face-voice association

MS Saeed, MH Khan, S Nawaz… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
We study the problem of learning association between face and voice. Prior works adopt
pairwise or triplet loss formulations to learn an embedding space amenable for associated …

Cross-modal perceptionist: Can face geometry be gleaned from voices?

CY Wu, CC Hsu, U Neumann - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
This work digs into a root question in human perception: can face geometry be gleaned from
one's voices? Previous works that study this question only adopt developments in image …

Unsupervised voice-face representation learning by cross-modal prototype contrast

B Zhu, K Xu, C Wang, Z Qin, T Sun, H Wang… - arxiv preprint arxiv …, 2022 - arxiv.org
We present an approach to learn voice-face representations from the talking face videos,
without any identity labels. Previous works employ cross-modal instance discrimination …

Audio-visual speaker verification via joint cross-attention

GP Rajasekhar, J Alam - International Conference on Speech and …, 2023 - Springer
Speaker verification has been widely explored using speech signals, which has shown
significant improvement using deep models. Recently, there has been a surge in exploring …

Rethinking voice-face correlation: A geometry view

X Li, Y Wen, M Yang, J Wang, R Singh… - proceedings of the 31st …, 2023 - dl.acm.org
Previous works on voice-face matching and voice-guided face synthesis demonstrate strong
correlations between voice and face, but mainly rely on coarse semantic cues such as …

VoiceStyle: Voice-based Face Generation Via Cross-modal Prototype Contrastive Learning

W Chen, B Zhu, K Xu, Y Dou, D Feng - ACM Transactions on Multimedia …, 2024 - dl.acm.org
Can we predict a person's appearance solely based on their voice? This article explores this
question by focusing on generating a face from an unheard voice segment. Our proposed …

Speaker recognition in realistic scenario using multimodal data

SH Shah, MS Saeed, S Nawaz… - 2023 3rd International …, 2023 - ieeexplore.ieee.org
In recent years, an association is established between faces and voices of celebrities
leveraging large scale audio-visual information from YouTube. The availability of large scale …