RETRACTED ARTICLE: ICDN: integrating consistency and difference networks by transformer for multimodal sentiment analysis

Q Zhang, L Shi, P Liu, Z Zhu, L Xu - Applied Intelligence, 2023 - Springer
The sentiment of human language is usually reflected through multimodal forms such as
natural language, facial expression, and voice intonation. However, the previous research …

TSPNet: Translation supervised prototype network via residual learning for multimodal social relation extraction

H Kang, X Li, L **, C Liu, Z Zhang, S Li, Y Zhang - Neurocomputing, 2022 - Elsevier
Multimodal social relation extraction requires sufficient features fusion to identify the relation
between different targets. Compared with traditional multimodal social relation extraction …

Two stage multi-modal modeling for video interaction analysis in deep video understanding challenge

S Sun, X **ong, Y Zheng - Proceedings of the 30th ACM International …, 2022 - dl.acm.org
Interaction understanding between different entities in human-centered movie video is
receiving more and more attention. Recently, a deep video understanding (DVU) task is …

Shifted GCN-GAT and Cumulative-Transformer based Social Relation Recognition for Long Videos

H Wang, Y Hu, Y Zhu, J Qi, B Wu - Proceedings of the 31st ACM …, 2023 - dl.acm.org
Social Relation Recognition is an important part of Video Understanding, providing insights
into the information that videos convey. Most previous works mainly focused on graph …

Multimodal analysis for deep video understanding with video language transformer

B Zhang, Y Fang, T Ren, G Wu - Proceedings of the 30th ACM …, 2022 - dl.acm.org
The Deep Video Understanding Challenge (DVUC) is aimed to use multiple modality
information to build high-level understanding of video, involving tasks such as relationship …

Multimodal early fusion operators for temporal video scene segmentation tasks

AAR Beserra, R Goularte - Multimedia Tools and Applications, 2023 - Springer
Abstract The Temporal Video Scene Segmentation (TVSS) task is still an open problem
presenting challenges in the Multimedia Analysis area. Current approaches employ …

MT-TCCT: Multi-task learning for multimodal emotion recognition

Y Wang, Z Chen, S Chen, Y Zhu - International Conference on Artificial …, 2022 - Springer
Multimodal emotion recognition is an emerging research field, which aims to capture
affective information from multimodal data, such as natural language, facial expression, and …

Hybrid improvements in multimodal analysis for deep video understanding

B Zhang, F Yu, Y Fang, T Ren, G Wu - Proceedings of the 3rd ACM …, 2021 - dl.acm.org
The Deep Video Understanding Challenge (DVU) is a task that focuses on comprehending
long duration videos which involve many entities. Its main goal is to build relationship and …

A Multi-Stream Approach for Video Understanding

L Kunam, L Rossetto, A Bernstein - Proceedings of the 30th ACM …, 2022 - dl.acm.org
The automatic annotation of higher-level semantic information in long-form video content is
still a challenging task. The Deep Video Understanding (DVU) Challenge aims at catalyzing …

Development of a MultiModal Annotation Framework and Dataset for Deep Video Understanding

E Loc, K Curtis, G Awad, S Rajput… - Proceedings of the 2nd …, 2022 - aclanthology.org
In this paper we introduce our approach and methods for collecting and annotating a new
dataset for deep video understanding. The proposed dataset is composed of 3 seasons (15 …