Hicmae: Hierarchical contrastive masked autoencoder for self-supervised audio-visual emotion recognition

L Sun, Z Lian, B Liu, J Tao - Information Fusion, 2024 - Elsevier
Abstract Audio-Visual Emotion Recognition (AVER) has garnered increasing attention in
recent years for its critical role in creating emotion-aware intelligent machines. Previous …

Selective acoustic feature enhancement for speech emotion recognition with noisy speech

SG Leem, D Fulford, JP Onnela… - … /ACM transactions on …, 2023 - ieeexplore.ieee.org
A speech emotion recognition (SER) system deployed on a real-world application can
encounter speech contaminated with unconstrained background noise. To deal with this …

[PDF][PDF] Versatile audio-visual learning for handling single and multi modalities in emotion regression and classification tasks

L Goncalves, SG Leem, WC Lin, B Sisman… - arxiv preprint arxiv …, 2023 - ecs.utdallas.edu
Most current audio-visual emotion recognition models lack the flexibility needed for
deployment in practical applications. We envision a multimodal system that works even …

Versatile audio-visual learning for emotion recognition

L Goncalves, SG Leem, WC Lin… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Most current audio-visual emotion recognition models lack the flexibility needed for
deployment in practical applications. We envision a multimodal system that works even …

Deep temporal clustering features for speech emotion recognition

WC Lin, C Busso - Speech Communication, 2024 - Elsevier
Deep clustering is a popular unsupervised technique for feature representation learning. We
recently proposed the chunk-based DeepEmoCluster framework for speech emotion …

Enhancing resilience to missing data in audio-text emotion recognition with multi-scale chunk regularization

WC Lin, L Goncalves, C Busso - … of the 25th International Conference on …, 2023 - dl.acm.org
Most existing audio-text emotion recognition studies have focused on the computational
modeling aspects, including strategies for fusing the modalities. An area that has received …

Detail-Enhanced Intra-and Inter-modal Interaction for Audio-Visual Emotion Recognition

T Shi, X Ge, JM Jose, N Pugeault… - … Conference on Pattern …, 2025 - Springer
Capturing complex temporal relationships between video and audio modalities is vital for
Audio-Visual Emotion Recognition (AVER). However, existing methods lack attention to …

Jointly Learning from Unimodal and Multimodal-Rated Labels in Audio-Visual Emotion Recognition

L Goncalves, HC Chou, AN Salman… - IEEE Open Journal …, 2025 - ieeexplore.ieee.org
Audio-visual emotion recognition (AVER) has been an important research area in human-
computer interaction (HCI). Traditionally, audio-visual emotional datasets and …