- Academic Search

C Yan, L Meng, L Li, J Zhang, Z Wang, J Yin… - ACM Transactions on …, 2022 - dl.acm.org

Different from general face recognition, age-invariant face recognition (AIFR) aims at
matching faces with a big age gap. Previous discriminative methods usually focus on …

Save Cite Cited by 179 Related articles All 3 versions Free GPT-4

[Free GPT-4]

[PDF] ieee.org

Deep multimodal representation learning: A survey

W Guo, J Wang, S Wang - Ieee Access, 2019 - ieeexplore.ieee.org

Multimodal representation learning, which aims to narrow the heterogeneity gap among
different modalities, plays an indispensable role in the utilization of ubiquitous multimodal …

Save Cite Cited by 540 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] thecvf.com

Epic-fusion: Audio-visual temporal binding for egocentric action recognition

E Kazakos, A Nagrani, A Zisserman… - Proceedings of the …, 2019 - openaccess.thecvf.com

We focus on multi-modal fusion for egocentric action recognition, and propose a novel
architecture for multi-modal temporal-binding, ie the combination of modalities within a …

Save Cite Cited by 428 Related articles All 15 versions Free GPT-4 View as HTML

[Free GPT-4]

[HTML] mdpi.com

[HTML][HTML] Deep learning innovations in video classification: A survey on techniques and dataset evaluations

M Mao, A Lee, M Hong - Electronics, 2024 - mdpi.com

Video classification has achieved remarkable success in recent years, driven by advanced
deep learning models that automatically categorize video content. This paper provides a …

Save Cite Cited by 5 Related articles All 4 versions Free GPT-4 Cached

[Free GPT-4]

[PDF] thecvf.com

Music gesture for visual sound separation

C Gan, D Huang, H Zhao… - Proceedings of the …, 2020 - openaccess.thecvf.com

Recent deep learning approaches have achieved impressive performance on visual sound
separation tasks. However, these approaches are mostly built on appearance and optical …

Save Cite Cited by 227 Related articles All 9 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] thecvf.com

Learning multi-view aggregation in the wild for large-scale 3d semantic segmentation

D Robert, B Vallet, L Landrieu - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

Recent works on 3D semantic segmentation propose to exploit the synergy between images
and point clouds by processing each modality with a dedicated network and projecting …

Save Cite Cited by 80 Related articles All 9 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] aaai.org

Cross-modality attention with semantic graph embedding for multi-label classification

R You, Z Guo, L Cui, X Long, Y Bao… - Proceedings of the AAAI …, 2020 - ojs.aaai.org

Multi-label image and video classification are fundamental yet challenging tasks in computer
vision. The main challenges lie in capturing spatial or temporal dependencies between …

Save Cite Cited by 210 Related articles All 8 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Foley music: Learning to generate music from videos

C Gan, D Huang, P Chen, JB Tenenbaum… - Computer Vision–ECCV …, 2020 - Springer

In this paper, we introduce Foley Music, a system that can synthesize plausible music for a
silent video clip about people playing musical instruments. We first identify two key …

Save Cite Cited by 154 Related articles All 8 versions Free GPT-4

[Free GPT-4]

[PDF] thecvf.com

Attention clusters: Purely attention based local feature integration for video classification

X Long, C Gan, G De Melo, J Wu… - Proceedings of the …, 2018 - openaccess.thecvf.com

Recently, substantial research effort has focused on how to apply CNNs or RNNs to better
capture temporal patterns in videos, so as to improve the accuracy of video classification. In …

Save Cite Cited by 287 Related articles All 21 versions Free GPT-4 View as HTML

Overview of behavior recognition based on deep learning

K Hu, J **, F Zheng, L Weng, Y Ding - Artificial intelligence review, 2023 - Springer

Human behavior recognition has always been a hot spot for research in computer vision.
With the wide application of behavior recognition in virtual reality and short video in recent …

Save Cite Cited by 73 Related articles All 4 versions Free GPT-4

Create alert

Cite

Advanced search

Saved to My library

Multimodal keyless attention fusion for video classification

Age-invariant face recognition by multi-feature fusionand decomposition with self-attention

Deep multimodal representation learning: A survey

Epic-fusion: Audio-visual temporal binding for egocentric action recognition

[HTML][HTML] Deep learning innovations in video classification: A survey on techniques and dataset evaluations

Music gesture for visual sound separation

Learning multi-view aggregation in the wild for large-scale 3d semantic segmentation

Cross-modality attention with semantic graph embedding for multi-label classification

Foley music: Learning to generate music from videos

Attention clusters: Purely attention based local feature integration for video classification

Overview of behavior recognition based on deep learning