Human action recognition from various data modalities: A review
Human Action Recognition (HAR) aims to understand human behavior and assign a label to
each action. It has a wide range of applications, and therefore has been attracting increasing …
each action. It has a wide range of applications, and therefore has been attracting increasing …
Human activity recognition: Review, taxonomy and open challenges
Nowadays, Human Activity Recognition (HAR) is being widely used in a variety of domains,
and vision and sensor-based data enable cutting-edge technologies to detect, recognize …
and vision and sensor-based data enable cutting-edge technologies to detect, recognize …
Less is more: Clipbert for video-and-language learning via sparse sampling
The canonical approach to video-and-language learning (eg, video question answering)
dictates a neural model to learn from offline-extracted dense video features from vision …
dictates a neural model to learn from offline-extracted dense video features from vision …
Dynamic neural networks: A survey
Dynamic neural network is an emerging research topic in deep learning. Compared to static
models which have fixed computational graphs and parameters at the inference stage …
models which have fixed computational graphs and parameters at the inference stage …
Revisiting the" video" in video-language understanding
What makes a video task uniquely suited for videos, beyond what can be understood from a
single image? Building on recent progress in self-supervised image-language models, we …
single image? Building on recent progress in self-supervised image-language models, we …
A dynamic multi-scale voxel flow network for video prediction
The performance of video prediction has been greatly boosted by advanced deep neural
networks. However, most of the current methods suffer from large model sizes and require …
networks. However, most of the current methods suffer from large model sizes and require …
X3d: Expanding architectures for efficient video recognition
C Feichtenhofer - Proceedings of the IEEE/CVF conference …, 2020 - openaccess.thecvf.com
This paper presents X3D, a family of efficient video networks that progressively expand a
tiny 2D image classification architecture along multiple network axes, in space, time, width …
tiny 2D image classification architecture along multiple network axes, in space, time, width …
Rethinking video vits: Sparse video tubes for joint image and video learning
AJ Piergiovanni, W Kuo… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
We present a simple approach which can turn a ViT encoder into an efficient video model,
which can seamlessly work with both image and video inputs. By sparsely sampling the …
which can seamlessly work with both image and video inputs. By sparsely sampling the …
A comprehensive study of deep video action recognition
Video action recognition is one of the representative tasks for video understanding. Over the
last decade, we have witnessed great advancements in video action recognition thanks to …
last decade, we have witnessed great advancements in video action recognition thanks to …
Listen to look: Action recognition by previewing audio
In the face of the video data deluge, today's expensive clip-level classifiers are increasingly
impractical. We propose a framework for efficient action recognition in untrimmed video that …
impractical. We propose a framework for efficient action recognition in untrimmed video that …