Vision transformers for action recognition: A survey
Vision transformers are emerging as a powerful tool to solve computer vision problems.
Recent techniques have also proven the efficacy of transformers beyond the image domain …
Recent techniques have also proven the efficacy of transformers beyond the image domain …
Deep learning-based action detection in untrimmed videos: A survey
Understanding human behavior and activity facilitates advancement of numerous real-world
applications, and is critical for video analysis. Despite the progress of action recognition …
applications, and is critical for video analysis. Despite the progress of action recognition …
Star-transformer: a spatio-temporal cross attention transformer for human action recognition
In action recognition, although the combination of spatio-temporal videos and skeleton
features can improve the recognition performance, a separate model and balancing feature …
features can improve the recognition performance, a separate model and balancing feature …
Videollm: Modeling video sequence with large language models
With the exponential growth of video data, there is an urgent need for automated technology
to analyze and comprehend video content. However, existing video understanding models …
to analyze and comprehend video content. However, existing video understanding models …
Video transformers: A survey
Transformer models have shown great success handling long-range interactions, making
them a promising tool for modeling video. However, they lack inductive biases and scale …
them a promising tool for modeling video. However, they lack inductive biases and scale …
Hybrid relation guided set matching for few-shot action recognition
Current few-shot action recognition methods reach impressive performance by learning
discriminative features for each video via episodic training and designing various temporal …
discriminative features for each video via episodic training and designing various temporal …
Molo: Motion-augmented long-short contrastive learning for few-shot action recognition
Current state-of-the-art approaches for few-shot action recognition achieve promising
performance by conducting frame-level matching on learned visual features. However, they …
performance by conducting frame-level matching on learned visual features. However, they …
Flow-guided transformer for video inpainting
We propose a flow-guided transformer, which innovatively leverage the motion discrepancy
exposed by optical flows to instruct the attention retrieval in transformer for high fidelity video …
exposed by optical flows to instruct the attention retrieval in transformer for high fidelity video …
Real-time online video detection with temporal smoothing transformers
Streaming video recognition reasons about objects and their actions in every frame of a
video. A good streaming recognition model captures both long-term dynamics and short …
video. A good streaming recognition model captures both long-term dynamics and short …
Progress-aware online action segmentation for egocentric procedural task videos
We address the problem of online action segmentation for egocentric procedural task
videos. While previous studies have mostly focused on offline action segmentation where …
videos. While previous studies have mostly focused on offline action segmentation where …