Frozen clip models are efficient video learners

Z Lin, S Geng, R Zhang, P Gao, G De Melo… - … on Computer Vision, 2022 - Springer
Video recognition has been dominated by the end-to-end learning paradigm–first initializing
a video recognition model with weights of a pretrained image model and then conducting …

Movinets: Mobile video networks for efficient video recognition

D Kondratyuk, L Yuan, Y Li, L Zhang… - Proceedings of the …, 2021 - openaccess.thecvf.com
Abstract We present Mobile Video Networks (MoViNets), a family of computation and
memory efficient video networks that can operate on streaming video for online inference …

A comprehensive review of recent deep learning techniques for human activity recognition

VT Le, K Tran-Trung, VT Hoang - Computational Intelligence …, 2022 - Wiley Online Library
Human action recognition is an important field in computer vision that has attracted
remarkable attention from researchers. This survey aims to provide a comprehensive …

Top-heavy CapsNets based on spatiotemporal non-local for action recognition

MH Ha - Journal of Computing Theories and Applications, 2024 - dl.futuretechsci.org
To effectively comprehend human actions, we have developed a Deep Neural Network
(DNN) that utilizes inner spatiotemporal non-locality to capture meaningful semantic context …

Improving human activity recognition integrating lstm with different data sources: Features, object detection and skeleton tracking

JD Domingo, J Gomez-Garcia-Bermejo… - IEEE Access, 2022 - ieeexplore.ieee.org
Over the past few years, technologies in the field of computer vision have greatly advanced.
The use of deep neural networks, together with the development of computing capabilities …

Scene image and human skeleton-based dual-stream human action recognition

Q Xu, W Zheng, Y Song, C Zhang, X Yuan… - Pattern Recognition Letters, 2021 - Elsevier
The dual stream-based human action recognition model offers the advantage of high
recognition accuracy, but the algorithm is less robust in case of lighting changes. The human …

A Survey of Recent Advances and Challenges in Deep Audio-Visual Correlation Learning

L Vilaca, Y Yu, P Vinan - arxiv preprint arxiv:2412.00049, 2024 - arxiv.org
Audio-visual correlation learning aims to capture and understand natural phenomena
between audio and visual data. The rapid growth of Deep Learning propelled the …

Multiscale human activity recognition and anticipation network

Y **ng, S Golodetz, A Everitt… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Deep convolutional neural networks have been leveraged to achieve huge improvements in
video understanding and human activity recognition performance in the past decade …

Spatial-temporal multiscale feature optimization based two-stream convolutional neural network for action recognition

L **a, W Fu - Cluster Computing, 2024 - Springer
Human action recognition is one of the most challenging tasks in computer vision due to
background noise interference and video frame redundancy. Therefore, we propose a two …

Multimodal Abnormal Event Detection in Public Transportation

D Tsiktsiris, A Lalas, M Dasygenis, K Votis - IEEE Access, 2024 - ieeexplore.ieee.org
This work addresses the growing concerns about security and passenger safety on public
transportation. With the increasing demand for public transport and the rise in road traffic …