Using transformers for multimodal emotion recognition: Taxonomies and state of the art review

S Hazmoune, F Bougamouza - Engineering Applications of Artificial …, 2024 - Elsevier
Emotion recognition is an aspect of human-computer interaction, affective computing, and
social robotics. Conventional unimodal approaches for emotion recognition, depending on …

Emotion embedding framework with emotional self-attention mechanism for speaker recognition

D Li, Z Yang, J Liu, H Yang, Z Wang - Expert Systems with Applications, 2024 - Elsevier
The emotional states of speech have a great impact on the efficiency of speaker recognition
(SR) system. Many researchers focus on how to map speech with different emotions to an …

Multi-modal speech emotion recognition: Improving accuracy through fusion of vggish and bert features with multi-head attention

PN Tran, TDT Vu, DNM Dang, NT Pham… - … Conference on Industrial …, 2023 - Springer
Recent research has shown that multi-modal learning is a successful method for enhancing
classification performance by mixing several forms of input, notably in speech-emotion …

Comparative analysis of multi-loss functions for enhanced multi-modal speech emotion recognition

PN Tran, TDT Vu, NT Pham… - … on Information and …, 2023 - ieeexplore.ieee.org
In recent years, multi-modal analysis has gained significant prominence across domains
such as audio/speech processing, natural language processing, and affective computing …

Mersa: Multimodal emotion recognition with self-align embedding

QB Le, KT Trinh, NDH Son, PN Tran… - 2024 International …, 2024 - ieeexplore.ieee.org
Emotions are an integral part of human communication and interaction, significantly sha**
our social connections, decision-making, and overall well-being. Understanding and …

Quantum-Enhanced Transformers for Robust Acoustic Scene Classification in IoT Environments

MK Quan, M Wijayasundara, S Setunge… - arxiv preprint arxiv …, 2025 - arxiv.org
The proliferation of Internet of Things (IoT) devices equipped with acoustic sensors
necessitates robust acoustic scene classification (ASC) capabilities, even in noisy and data …

SER-Fuse: An Emotion Recognition Application Utilizing Multi-Modal, Multi-Lingual, and Multi-Feature Fusion

NT Pham, LT Phan, DNM Dang… - Proceedings of the 12th …, 2023 - dl.acm.org
Speech emotion recognition (SER) is a crucial aspect of affective computing and human-
computer interaction, yet effectively identifying emotions in different speakers and languages …

Enhancing Speech Emotion Recognition Through Knowledge Distillation

TM Nguyen, PN Tran, DNM Dang - 2024 15th International …, 2024 - ieeexplore.ieee.org
The importance of Speech Emotion Recognition (SER) is growing across diverse
applications, which has resulted in the development of multiple methodologies and models …

Dimensional Speech Emotion Recognition from Bimodal Features

L Guder, JP Aires, F Meneguzzi… - Simpósio Brasileiro de …, 2024 - sol.sbc.org.br
Considering the human-machine relationship, affective computing aims to allow computers
to recognize or express emotions. Speech Emotion Recognition is a task from affective …

[PDF][PDF] Dimensional speech emotion recognition: a bimodal approach

LDC Guder - 2024 - repositorio.pucrs.br
Considerando a relação humano-computador, a computação afetiva visa permitir com que
computadores sejam capazes de reconhecer ou expressar emoções. O Reconhecimento de …