Age-invariant face recognition by multi-feature fusionand decomposition with self-attention

C Yan, L Meng, L Li, J Zhang, Z Wang, J Yin… - ACM Transactions on …, 2022 - dl.acm.org
Different from general face recognition, age-invariant face recognition (AIFR) aims at
matching faces with a big age gap. Previous discriminative methods usually focus on …

Deep multimodal representation learning: A survey

W Guo, J Wang, S Wang - Ieee Access, 2019 - ieeexplore.ieee.org
Multimodal representation learning, which aims to narrow the heterogeneity gap among
different modalities, plays an indispensable role in the utilization of ubiquitous multimodal …

Epic-fusion: Audio-visual temporal binding for egocentric action recognition

E Kazakos, A Nagrani, A Zisserman… - Proceedings of the …, 2019 - openaccess.thecvf.com
We focus on multi-modal fusion for egocentric action recognition, and propose a novel
architecture for multi-modal temporal-binding, ie the combination of modalities within a …

[HTML][HTML] Deep learning innovations in video classification: A survey on techniques and dataset evaluations

M Mao, A Lee, M Hong - Electronics, 2024 - mdpi.com
Video classification has achieved remarkable success in recent years, driven by advanced
deep learning models that automatically categorize video content. This paper provides a …

Music gesture for visual sound separation

C Gan, D Huang, H Zhao… - Proceedings of the …, 2020 - openaccess.thecvf.com
Recent deep learning approaches have achieved impressive performance on visual sound
separation tasks. However, these approaches are mostly built on appearance and optical …

Learning multi-view aggregation in the wild for large-scale 3d semantic segmentation

D Robert, B Vallet, L Landrieu - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Recent works on 3D semantic segmentation propose to exploit the synergy between images
and point clouds by processing each modality with a dedicated network and projecting …

Cross-modality attention with semantic graph embedding for multi-label classification

R You, Z Guo, L Cui, X Long, Y Bao… - Proceedings of the AAAI …, 2020 - ojs.aaai.org
Multi-label image and video classification are fundamental yet challenging tasks in computer
vision. The main challenges lie in capturing spatial or temporal dependencies between …

Foley music: Learning to generate music from videos

C Gan, D Huang, P Chen, JB Tenenbaum… - Computer Vision–ECCV …, 2020 - Springer
In this paper, we introduce Foley Music, a system that can synthesize plausible music for a
silent video clip about people playing musical instruments. We first identify two key …

Attention clusters: Purely attention based local feature integration for video classification

X Long, C Gan, G De Melo, J Wu… - Proceedings of the …, 2018 - openaccess.thecvf.com
Recently, substantial research effort has focused on how to apply CNNs or RNNs to better
capture temporal patterns in videos, so as to improve the accuracy of video classification. In …

Overview of behavior recognition based on deep learning

K Hu, J **, F Zheng, L Weng, Y Ding - Artificial intelligence review, 2023 - Springer
Human behavior recognition has always been a hot spot for research in computer vision.
With the wide application of behavior recognition in virtual reality and short video in recent …