Human action recognition from various data modalities: A review

Z Sun, Q Ke, H Rahmani, M Bennamoun… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Human Action Recognition (HAR) aims to understand human behavior and assign a label to
each action. It has a wide range of applications, and therefore has been attracting increasing …

A review of multimodal human activity recognition with special emphasis on classification, applications, challenges and future directions

SK Yadav, K Tiwari, HM Pandey, SA Akbar - Knowledge-Based Systems, 2021 - Elsevier
Human activity recognition (HAR) is one of the most important and challenging problems in
the computer vision. It has critical application in wide variety of tasks including gaming …

Video swin transformer

Z Liu, J Ning, Y Cao, Y Wei, Z Zhang… - Proceedings of the …, 2022 - openaccess.thecvf.com
The vision community is witnessing a modeling shift from CNNs to Transformers, where pure
Transformer architectures have attained top accuracy on the major video recognition …

Actionclip: A new paradigm for video action recognition

M Wang, J **ng, Y Liu - arxiv preprint arxiv:2109.08472, 2021 - arxiv.org
The canonical approach to video action recognition dictates a neural model to do a classic
and standard 1-of-N majority vote task. They are trained to predict a fixed set of predefined …

Two-stream network for sign language recognition and translation

Y Chen, R Zuo, F Wei, Y Wu, S Liu… - Advances in Neural …, 2022 - proceedings.neurips.cc
Sign languages are visual languages using manual articulations and non-manual elements
to convey information. For sign language recognition and translation, the majority of existing …

Action-net: Multipath excitation for action recognition

Z Wang, Q She, A Smolic - … of the IEEE/CVF conference on …, 2021 - openaccess.thecvf.com
Abstract Spatial-temporal, channel-wise, and motion patterns are three complementary and
crucial types of information for video action recognition. Conventional 2D CNNs are …

Vidtr: Video transformer without convolutions

Y Zhang, X Li, C Liu, B Shuai, Y Zhu… - Proceedings of the …, 2021 - openaccess.thecvf.com
Abstract We introduce Video Transformer (VidTr) with separable-attention for video
classification. Comparing with commonly used 3D networks, VidTr is able to aggregate …

Finegym: A hierarchical video dataset for fine-grained action understanding

D Shao, Y Zhao, B Dai, D Lin - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com
On public benchmarks, current action recognition techniques have achieved great success.
However, when used in real-world applications, eg sport analysis, which requires the …

Evidential deep learning for open set action recognition

W Bao, Q Yu, Y Kong - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
In a real-world scenario, human actions are typically out of the distribution from training data,
which requires a model to both recognize the known actions and reject the unknown …

A comprehensive study of deep video action recognition

Y Zhu, X Li, C Liu, M Zolfaghari, Y **ong, C Wu… - arxiv preprint arxiv …, 2020 - arxiv.org
Video action recognition is one of the representative tasks for video understanding. Over the
last decade, we have witnessed great advancements in video action recognition thanks to …