Google znalac

MS Saeed, S Nawaz, MH Khan… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

With the rapid growth of social media platforms, users are sharing billions of multimedia
posts containing audio, images, and text. Researchers have focused on building …

Spremi Citiraj Spominje se 18 puta Srodni članci Svih 6 inačica

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Speaker recognition in realistic scenario using multimodal data

SH Shah, MS Saeed, S Nawaz… - 2023 3rd International …, 2023 - ieeexplore.ieee.org

In recent years, an association is established between faces and voices of celebrities
leveraging large scale audio-visual information from YouTube. The availability of large scale …

Spremi Citiraj Spominje se 13 puta Srodni članci Svih 3 inačica

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Dctm: Dilated convolutional transformer model for multimodal engagement estimation in conversation

VN Tu, VT Huynh, HJ Yang, SH Kim, S Nawaz… - Proceedings of the 31st …, 2023 - dl.acm.org

Conversational engagement estimation is posed as a regression problem, entailing the
identification of the favorable attention and involvement of the participants in the …

Spremi Citiraj Spominje se 7 puta Srodni članci Svih 6 inačica

Multimodal pre-train then transfer learning approach for speaker recognition

S Jabeen, MS Amin, X Li - Multimedia Tools and Applications, 2024 - Springer

Cognitive science has well-established the correlation between faces and voices because
neuro-cognitive pathways of both information share the same structure. Recently, the task …

Spremi Citiraj Spominje se 1 puta Srodni članci

[Free GPT-4]
[DeepSeek]

[HTML] mdpi.com

[HTML][HTML] Audio–Visual Fusion Based on Interactive Attention for Person Verification

X **g, L He, Z Song, S Wang - Sensors, 2023 - mdpi.com

With the rapid development of multimedia technology, personnel verification systems have
become increasingly important in the security field and identity verification. However …

Spremi Citiraj Spominje se 1 puta Srodni članci Svih 10 inačica Spremljeno u privremenu memoriju

Stvori obavijest

Citiraj

Napredno pretraživanje

Spremljeno u Moju knjižnicu

Learning branched fusion and orthogonal projection for face-voice association

Single-branch network for multimodal training

Speaker recognition in realistic scenario using multimodal data

Dctm: Dilated convolutional transformer model for multimodal engagement estimation in conversation

Multimodal pre-train then transfer learning approach for speaker recognition

[HTML][HTML] Audio–Visual Fusion Based on Interactive Attention for Person Verification