Deep learning-based multimodal emotion recognition from audio, visual, and text modalities: A systematic review of recent advancements and future prospects

S Zhang, Y Yang, C Chen, X Zhang, Q Leng… - Expert Systems with …, 2024 - Elsevier
Emotion recognition has recently attracted extensive interest due to its significant
applications to human–computer interaction. The expression of human emotion depends on …

Large language models meet text-centric multimodal sentiment analysis: A survey

H Yang, Y Zhao, Y Wu, S Wang, T Zheng… - arxiv preprint arxiv …, 2024 - arxiv.org
Compared to traditional sentiment analysis, which only considers text, multimodal sentiment
analysis needs to consider emotional signals from multimodal sources simultaneously and …

NusaCrowd: Open source initiative for Indonesian NLP resources

S Cahyawijaya, H Lovenia, AF Aji, GI Winata… - arxiv preprint arxiv …, 2022 - arxiv.org
We present NusaCrowd, a collaborative initiative to collect and unify existing resources for
Indonesian languages, including opening access to previously non-public resources …

Negative object presence evaluation (nope) to measure object hallucination in vision-language models

H Lovenia, W Dai, S Cahyawijaya, Z Ji… - arxiv preprint arxiv …, 2023 - arxiv.org
Object hallucination poses a significant challenge in vision-language (VL) models, often
leading to the generation of nonsensical or unfaithful responses with non-existent objects …

One country, 700+ languages: NLP challenges for underrepresented languages and dialects in Indonesia

AF Aji, GI Winata, F Koto, S Cahyawijaya… - arxiv preprint arxiv …, 2022 - arxiv.org
NLP research is impeded by a lack of resources and awareness of the challenges presented
by underrepresented languages and dialects. Focusing on the languages spoken in …

Multimodal emotion detection via attention-based fusion of extracted facial and speech features

D Mamieva, AB Abdusalomov, A Kutlimuratov… - Sensors, 2023 - mdpi.com
Methods for detecting emotions that employ many modalities at the same time have been
found to be more accurate and resilient than those that rely on a single sense. This is due to …

A facial expression-aware multimodal multi-task learning framework for emotion recognition in multi-party conversations

W Zheng, J Yu, R **a, S Wang - … of the 61st Annual Meeting of the …, 2023 - aclanthology.org
Abstract Multimodal Emotion Recognition in Multiparty Conversations (MERMC) has
recently attracted considerable attention. Due to the complexity of visual scenes in multi …

M-SENA: An integrated platform for multimodal sentiment analysis

H Mao, Z Yuan, H Xu, W Yu, Y Liu, K Gao - arxiv preprint arxiv …, 2022 - arxiv.org
M-SENA is an open-sourced platform for Multimodal Sentiment Analysis. It aims to facilitate
advanced research by providing flexible toolkits, reliable benchmarks, and intuitive …

Multi-label multimodal emotion recognition with transformer-based fusion and emotion-level representation learning

HD Le, GS Lee, SH Kim, S Kim, HJ Yang - Ieee Access, 2023 - ieeexplore.ieee.org
Emotion recognition has been an active research area for a long time. Recently, multimodal
emotion recognition from video data has grown in importance with the explosion of video …

Vision guided generative pre-trained language models for multimodal abstractive summarization

T Yu, W Dai, Z Liu, P Fung - arxiv preprint arxiv:2109.02401, 2021 - arxiv.org
Multimodal abstractive summarization (MAS) models that summarize videos (vision
modality) and their corresponding transcripts (text modality) are able to extract the essential …