RETRACTED ARTICLE: ICDN: integrating consistency and difference networks by transformer for multimodal sentiment analysis
Q Zhang, L Shi, P Liu, Z Zhu, L Xu - Applied Intelligence, 2023 - Springer
The sentiment of human language is usually reflected through multimodal forms such as
natural language, facial expression, and voice intonation. However, the previous research …
natural language, facial expression, and voice intonation. However, the previous research …
TSPNet: Translation supervised prototype network via residual learning for multimodal social relation extraction
Multimodal social relation extraction requires sufficient features fusion to identify the relation
between different targets. Compared with traditional multimodal social relation extraction …
between different targets. Compared with traditional multimodal social relation extraction …
Two stage multi-modal modeling for video interaction analysis in deep video understanding challenge
Interaction understanding between different entities in human-centered movie video is
receiving more and more attention. Recently, a deep video understanding (DVU) task is …
receiving more and more attention. Recently, a deep video understanding (DVU) task is …
Shifted GCN-GAT and Cumulative-Transformer based Social Relation Recognition for Long Videos
Social Relation Recognition is an important part of Video Understanding, providing insights
into the information that videos convey. Most previous works mainly focused on graph …
into the information that videos convey. Most previous works mainly focused on graph …
Multimodal analysis for deep video understanding with video language transformer
The Deep Video Understanding Challenge (DVUC) is aimed to use multiple modality
information to build high-level understanding of video, involving tasks such as relationship …
information to build high-level understanding of video, involving tasks such as relationship …
Multimodal early fusion operators for temporal video scene segmentation tasks
Abstract The Temporal Video Scene Segmentation (TVSS) task is still an open problem
presenting challenges in the Multimedia Analysis area. Current approaches employ …
presenting challenges in the Multimedia Analysis area. Current approaches employ …
MT-TCCT: Multi-task learning for multimodal emotion recognition
Y Wang, Z Chen, S Chen, Y Zhu - International Conference on Artificial …, 2022 - Springer
Multimodal emotion recognition is an emerging research field, which aims to capture
affective information from multimodal data, such as natural language, facial expression, and …
affective information from multimodal data, such as natural language, facial expression, and …
Hybrid improvements in multimodal analysis for deep video understanding
The Deep Video Understanding Challenge (DVU) is a task that focuses on comprehending
long duration videos which involve many entities. Its main goal is to build relationship and …
long duration videos which involve many entities. Its main goal is to build relationship and …
A Multi-Stream Approach for Video Understanding
The automatic annotation of higher-level semantic information in long-form video content is
still a challenging task. The Deep Video Understanding (DVU) Challenge aims at catalyzing …
still a challenging task. The Deep Video Understanding (DVU) Challenge aims at catalyzing …
Development of a MultiModal Annotation Framework and Dataset for Deep Video Understanding
In this paper we introduce our approach and methods for collecting and annotating a new
dataset for deep video understanding. The proposed dataset is composed of 3 seasons (15 …
dataset for deep video understanding. The proposed dataset is composed of 3 seasons (15 …