Human action recognition from various data modalities: A review
Human Action Recognition (HAR) aims to understand human behavior and assign a label to
each action. It has a wide range of applications, and therefore has been attracting increasing …
each action. It has a wide range of applications, and therefore has been attracting increasing …
A survey of human action recognition and posture prediction
Human action recognition and posture prediction aim to recognize and predict respectively
the action and postures of persons in videos. They are both active research topics in …
the action and postures of persons in videos. They are both active research topics in …
Videocomposer: Compositional video synthesis with motion controllability
The pursuit of controllability as a higher standard of visual content creation has yielded
remarkable progress in customizable image synthesis. However, achieving controllable …
remarkable progress in customizable image synthesis. However, achieving controllable …
A comprehensive study of deep video action recognition
Video action recognition is one of the representative tasks for video understanding. Over the
last decade, we have witnessed great advancements in video action recognition thanks to …
last decade, we have witnessed great advancements in video action recognition thanks to …
Listen to look: Action recognition by previewing audio
In the face of the video data deluge, today's expensive clip-level classifiers are increasingly
impractical. We propose a framework for efficient action recognition in untrimmed video that …
impractical. We propose a framework for efficient action recognition in untrimmed video that …
Two-stream consensus network for weakly-supervised temporal action localization
Abstract Weakly-supervised Temporal Action Localization (W-TAL) aims to classify and
localize all action instances in an untrimmed video under only video-level supervision …
localize all action instances in an untrimmed video under only video-level supervision …
Mm-vit: Multi-modal video transformer for compressed video action recognition
J Chen, CM Ho - Proceedings of the IEEE/CVF winter …, 2022 - openaccess.thecvf.com
This paper presents a pure transformer-based approach, dubbed the Multi-Modal Video
Transformer (MM-ViT), for video action recognition. Different from other schemes which …
Transformer (MM-ViT), for video action recognition. Different from other schemes which …
You can ground earlier than see: An effective and efficient pipeline for temporal sentence grounding in compressed videos
Given an untrimmed video, temporal sentence grounding (TSG) aims to locate a target
moment semantically according to a sentence query. Although previous respectable works …
moment semantically according to a sentence query. Although previous respectable works …
Gate-shift networks for video action recognition
Deep 3D CNNs for video action recognition are designed to learn powerful representations
in the joint spatio-temporal feature space. In practice however, because of the large number …
in the joint spatio-temporal feature space. In practice however, because of the large number …
Spatio-temporal attention networks for action recognition and detection
Recently, 3D Convolutional Neural Network (3D CNN) models have been widely studied for
video sequences and achieved satisfying performance in action recognition and detection …
video sequences and achieved satisfying performance in action recognition and detection …