Age-invariant face recognition by multi-feature fusionand decomposition with self-attention
Different from general face recognition, age-invariant face recognition (AIFR) aims at
matching faces with a big age gap. Previous discriminative methods usually focus on …
matching faces with a big age gap. Previous discriminative methods usually focus on …
Deep multimodal representation learning: A survey
W Guo, J Wang, S Wang - Ieee Access, 2019 - ieeexplore.ieee.org
Multimodal representation learning, which aims to narrow the heterogeneity gap among
different modalities, plays an indispensable role in the utilization of ubiquitous multimodal …
different modalities, plays an indispensable role in the utilization of ubiquitous multimodal …
Epic-fusion: Audio-visual temporal binding for egocentric action recognition
We focus on multi-modal fusion for egocentric action recognition, and propose a novel
architecture for multi-modal temporal-binding, ie the combination of modalities within a …
architecture for multi-modal temporal-binding, ie the combination of modalities within a …
[HTML][HTML] Deep learning innovations in video classification: A survey on techniques and dataset evaluations
Video classification has achieved remarkable success in recent years, driven by advanced
deep learning models that automatically categorize video content. This paper provides a …
deep learning models that automatically categorize video content. This paper provides a …
Music gesture for visual sound separation
Recent deep learning approaches have achieved impressive performance on visual sound
separation tasks. However, these approaches are mostly built on appearance and optical …
separation tasks. However, these approaches are mostly built on appearance and optical …
Learning multi-view aggregation in the wild for large-scale 3d semantic segmentation
Recent works on 3D semantic segmentation propose to exploit the synergy between images
and point clouds by processing each modality with a dedicated network and projecting …
and point clouds by processing each modality with a dedicated network and projecting …
Cross-modality attention with semantic graph embedding for multi-label classification
Multi-label image and video classification are fundamental yet challenging tasks in computer
vision. The main challenges lie in capturing spatial or temporal dependencies between …
vision. The main challenges lie in capturing spatial or temporal dependencies between …
Foley music: Learning to generate music from videos
In this paper, we introduce Foley Music, a system that can synthesize plausible music for a
silent video clip about people playing musical instruments. We first identify two key …
silent video clip about people playing musical instruments. We first identify two key …
Attention clusters: Purely attention based local feature integration for video classification
Recently, substantial research effort has focused on how to apply CNNs or RNNs to better
capture temporal patterns in videos, so as to improve the accuracy of video classification. In …
capture temporal patterns in videos, so as to improve the accuracy of video classification. In …
Overview of behavior recognition based on deep learning
K Hu, J **, F Zheng, L Weng, Y Ding - Artificial intelligence review, 2023 - Springer
Human behavior recognition has always been a hot spot for research in computer vision.
With the wide application of behavior recognition in virtual reality and short video in recent …
With the wide application of behavior recognition in virtual reality and short video in recent …