Using transformers for multimodal emotion recognition: Taxonomies and state of the art review

S Hazmoune, F Bougamouza - Engineering Applications of Artificial …, 2024 - Elsevier
Emotion recognition is an aspect of human-computer interaction, affective computing, and
social robotics. Conventional unimodal approaches for emotion recognition, depending on …

Vesper: A compact and effective pretrained model for speech emotion recognition

W Chen, X **ng, P Chen, X Xu - IEEE Transactions on Affective …, 2024 - ieeexplore.ieee.org
This paper presents a paradigm that adapts general large-scale pretrained models (PTMs)
to speech emotion recognition task. Although PTMs shed new light on artificial general …

Multimodal emotion recognition using cross modal audio-video fusion with attention and deep metric learning

B Mocanu, R Tapu, T Zaharia - Image and Vision Computing, 2023 - Elsevier
In the last few years, the multi-modal emotion recognition has become an important research
issue in the affective computing community due to its wide range of applications that include …

Dreamtalk: When expressive talking head generation meets diffusion probabilistic models

Y Ma, S Zhang, J Wang, X Wang, Y Zhang… - arxiv preprint arxiv …, 2023 - arxiv.org
Diffusion models have shown remarkable success in a variety of downstream generative
tasks, yet remain under-explored in the important and challenging expressive talking head …

Speechformer++: A hierarchical efficient framework for paralinguistic speech processing

W Chen, X **ng, X Xu, J Pang… - IEEE/ACM Transactions …, 2023 - ieeexplore.ieee.org
Paralinguistic speech processing is important in addressing many issues, such as sentiment
and neurocognitive disorder analyses. Recently, Transformer has achieved remarkable …

Cross-language speech emotion recognition using multimodal dual attention transformers

SAM Zaidi, S Latif, J Qadir - arxiv preprint arxiv:2306.13804, 2023 - arxiv.org
Despite the recent progress in speech emotion recognition (SER), state-of-the-art systems
are unable to achieve improved performance in cross-language settings. In this paper, we …

Knowledge-aware bayesian co-attention for multimodal emotion recognition

Z Zhao, Y Wang, Y Wang - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Multimodal emotion recognition is a challenging research area that aims to fuse different
modalities to predict human emotion. However, most existing models that are based on …

Multi-scale temporal transformer for speech emotion recognition

Z Li, X **ng, Y Fang, W Zhang, H Fan, X Xu - arxiv preprint arxiv …, 2024 - arxiv.org
Speech emotion recognition plays a crucial role in human-machine interaction systems.
Recently various optimized Transformers have been successfully applied to speech emotion …

Dst: Deformable speech transformer for emotion recognition

W Chen, X **ng, X Xu, J Pang… - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Enabled by multi-head self-attention, Transformer has exhibited remarkable results in
speech emotion recognition (SER). Compared to the original full attention mechanism …

A multi-stage hierarchical relational graph neural network for multimodal sentiment analysis

P Gong, J Liu, X Zhang, X Li - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Multimodal sentiment analysis targets at accurately perceiving the emotional states by
incorporating related information from multiple sources. However, existing methods mostly …