Human action recognition from various data modalities: A review
Human Action Recognition (HAR) aims to understand human behavior and assign a label to
each action. It has a wide range of applications, and therefore has been attracting increasing …
each action. It has a wide range of applications, and therefore has been attracting increasing …
A review of multimodal human activity recognition with special emphasis on classification, applications, challenges and future directions
Human activity recognition (HAR) is one of the most important and challenging problems in
the computer vision. It has critical application in wide variety of tasks including gaming …
the computer vision. It has critical application in wide variety of tasks including gaming …
Video swin transformer
The vision community is witnessing a modeling shift from CNNs to Transformers, where pure
Transformer architectures have attained top accuracy on the major video recognition …
Transformer architectures have attained top accuracy on the major video recognition …
Actionclip: A new paradigm for video action recognition
The canonical approach to video action recognition dictates a neural model to do a classic
and standard 1-of-N majority vote task. They are trained to predict a fixed set of predefined …
and standard 1-of-N majority vote task. They are trained to predict a fixed set of predefined …
Two-stream network for sign language recognition and translation
Sign languages are visual languages using manual articulations and non-manual elements
to convey information. For sign language recognition and translation, the majority of existing …
to convey information. For sign language recognition and translation, the majority of existing …
Action-net: Multipath excitation for action recognition
Abstract Spatial-temporal, channel-wise, and motion patterns are three complementary and
crucial types of information for video action recognition. Conventional 2D CNNs are …
crucial types of information for video action recognition. Conventional 2D CNNs are …
Vidtr: Video transformer without convolutions
Abstract We introduce Video Transformer (VidTr) with separable-attention for video
classification. Comparing with commonly used 3D networks, VidTr is able to aggregate …
classification. Comparing with commonly used 3D networks, VidTr is able to aggregate …
Finegym: A hierarchical video dataset for fine-grained action understanding
On public benchmarks, current action recognition techniques have achieved great success.
However, when used in real-world applications, eg sport analysis, which requires the …
However, when used in real-world applications, eg sport analysis, which requires the …
Evidential deep learning for open set action recognition
In a real-world scenario, human actions are typically out of the distribution from training data,
which requires a model to both recognize the known actions and reject the unknown …
which requires a model to both recognize the known actions and reject the unknown …
A comprehensive study of deep video action recognition
Video action recognition is one of the representative tasks for video understanding. Over the
last decade, we have witnessed great advancements in video action recognition thanks to …
last decade, we have witnessed great advancements in video action recognition thanks to …