Human action recognition from various data modalities: A review

Z Sun, Q Ke, H Rahmani, M Bennamoun… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Human Action Recognition (HAR) aims to understand human behavior and assign a label to
each action. It has a wide range of applications, and therefore has been attracting increasing …

A survey of human action recognition and posture prediction

N Ma, Z Wu, Y Cheung, Y Guo, Y Gao… - Tsinghua Science …, 2022 - ieeexplore.ieee.org
Human action recognition and posture prediction aim to recognize and predict respectively
the action and postures of persons in videos. They are both active research topics in …

Videocomposer: Compositional video synthesis with motion controllability

X Wang, H Yuan, S Zhang, D Chen… - Advances in …, 2024 - proceedings.neurips.cc
The pursuit of controllability as a higher standard of visual content creation has yielded
remarkable progress in customizable image synthesis. However, achieving controllable …

A comprehensive study of deep video action recognition

Y Zhu, X Li, C Liu, M Zolfaghari, Y **ong, C Wu… - arxiv preprint arxiv …, 2020 - arxiv.org
Video action recognition is one of the representative tasks for video understanding. Over the
last decade, we have witnessed great advancements in video action recognition thanks to …

Listen to look: Action recognition by previewing audio

R Gao, TH Oh, K Grauman… - Proceedings of the …, 2020 - openaccess.thecvf.com
In the face of the video data deluge, today's expensive clip-level classifiers are increasingly
impractical. We propose a framework for efficient action recognition in untrimmed video that …

Two-stream consensus network for weakly-supervised temporal action localization

Y Zhai, L Wang, W Tang, Q Zhang, J Yuan… - Computer Vision–ECCV …, 2020 - Springer
Abstract Weakly-supervised Temporal Action Localization (W-TAL) aims to classify and
localize all action instances in an untrimmed video under only video-level supervision …

Mm-vit: Multi-modal video transformer for compressed video action recognition

J Chen, CM Ho - Proceedings of the IEEE/CVF winter …, 2022 - openaccess.thecvf.com
This paper presents a pure transformer-based approach, dubbed the Multi-Modal Video
Transformer (MM-ViT), for video action recognition. Different from other schemes which …

You can ground earlier than see: An effective and efficient pipeline for temporal sentence grounding in compressed videos

X Fang, D Liu, P Zhou, G Nan - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Given an untrimmed video, temporal sentence grounding (TSG) aims to locate a target
moment semantically according to a sentence query. Although previous respectable works …

Gate-shift networks for video action recognition

S Sudhakaran, S Escalera… - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com
Deep 3D CNNs for video action recognition are designed to learn powerful representations
in the joint spatio-temporal feature space. In practice however, because of the large number …

Spatio-temporal attention networks for action recognition and detection

J Li, X Liu, W Zhang, M Zhang, J Song… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Recently, 3D Convolutional Neural Network (3D CNN) models have been widely studied for
video sequences and achieved satisfying performance in action recognition and detection …