A review of multimodal human activity recognition with special emphasis on classification, applications, challenges and future directions

SK Yadav, K Tiwari, HM Pandey, SA Akbar - Knowledge-Based Systems, 2021 - Elsevier
Human activity recognition (HAR) is one of the most important and challenging problems in
the computer vision. It has critical application in wide variety of tasks including gaming …

A review of convolutional-neural-network-based action recognition

G Yao, T Lei, J Zhong - Pattern Recognition Letters, 2019 - Elsevier
Video action recognition is widely applied in video indexing, intelligent surveillance,
multimedia understanding, and other fields. Recently, it was greatly improved by …

Learning spatio-temporal representation with pseudo-3d residual networks

Z Qiu, T Yao, T Mei - proceedings of the IEEE International …, 2017 - openaccess.thecvf.com
Abstract Convolutional Neural Networks (CNN) have been regarded as a powerful class of
models for image recognition problems. Nevertheless, it is not trivial when utilizing a CNN …

Spatio-temporal lstm with trust gates for 3d human action recognition

J Liu, A Shahroudy, D Xu, G Wang - … The Netherlands, October 11-14, 2016 …, 2016 - Springer
Abstract 3D action recognition–analysis of human actions based on 3D skeleton data–
becomes popular recently due to its succinctness, robustness, and view-invariant …

Vidtr: Video transformer without convolutions

Y Zhang, X Li, C Liu, B Shuai, Y Zhu… - Proceedings of the …, 2021 - openaccess.thecvf.com
Abstract We introduce Video Transformer (VidTr) with separable-attention for video
classification. Comparing with commonly used 3D networks, VidTr is able to aggregate …

A comprehensive study of deep video action recognition

Y Zhu, X Li, C Liu, M Zolfaghari, Y **ong, C Wu… - arxiv preprint arxiv …, 2020 - arxiv.org
Video action recognition is one of the representative tasks for video understanding. Over the
last decade, we have witnessed great advancements in video action recognition thanks to …

Skeleton-based action recognition using spatio-temporal LSTM network with trust gates

J Liu, A Shahroudy, D Xu, AC Kot… - IEEE transactions on …, 2017 - ieeexplore.ieee.org
Skeleton-based human action recognition has attracted a lot of research attention during the
past few years. Recent works attempted to utilize recurrent neural networks to model the …

A deep hybrid learning model to detect unsafe behavior: Integrating convolution neural networks and long short-term memory

L Ding, W Fang, H Luo, PED Love, B Zhong… - Automation in …, 2018 - Elsevier
Computer vision and pattern recognition approaches have been applied to determine
unsafe behaviors on construction sites. Such approaches have been reliant on the …

Mict: Mixed 3d/2d convolutional tube for human action recognition

Y Zhou, X Sun, ZJ Zha, W Zeng - Proceedings of the IEEE …, 2018 - openaccess.thecvf.com
Human actions in videos are three-dimensional (3D) signals. Recent attempts use 3D
convolutional neural networks (CNNs) to explore spatio-temporal information for human …

Audio-visual emotion recognition in video clips

F Noroozi, M Marjanovic, A Njegus… - IEEE Transactions …, 2017 - ieeexplore.ieee.org
This paper presents a multimodal emotion recognition system, which is based on the
analysis of audio and visual cues. From the audio channel, Mel-Frequency Cepstral …