AMS-Net: Modeling adaptive multi-granularity spatio-temporal cues for video action recognition

Q Wang, Q Hu, Z Gao, P Li, Q Hu - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Effective spatio-temporal modeling as a core of video representation learning is challenged
by complex scale variations in spatio-temporal cues in videos, especially different visual …

Learning from untrimmed videos: Self-supervised video representation learning with hierarchical consistency

Z Qing, S Zhang, Z Huang, Y Xu… - Proceedings of the …, 2022 - openaccess.thecvf.com
Natural videos provide rich visual contents for self-supervised learning. Yet most existing
approaches for learning spatio-temporal representations rely on manually trimmed videos …

Highlighting object category immunity for the generalization of human-object interaction detection

X Liu, YL Li, C Lu - Proceedings of the AAAI Conference on Artificial …, 2022 - ojs.aaai.org
Abstract Human-Object Interaction (HOI) detection plays a core role in activity
understanding. As a compositional learning problem (human-verb-object), studying its …

Self-supervised learning from untrimmed videos via hierarchical consistency

Z Qing, S Zhang, Z Huang, Y Xu… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Natural untrimmed videos provide rich visual content for self-supervised learning. Yet most
previous efforts to learn spatio-temporal representations rely on manually trimmed videos …

Shifted gcn-gat and cumulative-transformer based social relation recognition for long videos

H Wang, Y Hu, Y Zhu, J Qi, B Wu - Proceedings of the 31st ACM …, 2023 - dl.acm.org
Social Relation Recognition is an important part of Video Understanding, providing insights
into the information that videos convey. Most previous works mainly focused on graph …

Markov Progressive Framework, a Universal Paradigm for Modeling Long Videos

B Pang, G Peng, Y Li, C Lu - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org
The computational complexity of video models increases linearly with the square number of
frames. Thus, constrained bycomputational resources, training video models to learn long …

High-order correlation network for video recognition

W Dong, Z Wang, B Zhang, J Zhang… - 2022 International Joint …, 2022 - ieeexplore.ieee.org
How to model global video representation is an important research content of video
recognition. Among current convolutional neural network (CNN) based methods, only using …

Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis

AH Fadaei, MRA Dehaqani - ar** end-to-end action recognition models on long videos is fundamental and crucial
for long-video action understanding. Due to the unaffordable cost of end-to-end training on …