Human action recognition from various data modalities: A review
Human Action Recognition (HAR) aims to understand human behavior and assign a label to
each action. It has a wide range of applications, and therefore has been attracting increasing …
each action. It has a wide range of applications, and therefore has been attracting increasing …
Videomamba: State space model for efficient video understanding
Addressing the dual challenges of local redundancy and global dependencies in video
understanding, this work innovatively adapts the Mamba to the video domain. The proposed …
understanding, this work innovatively adapts the Mamba to the video domain. The proposed …
St-adapter: Parameter-efficient image-to-video transfer learning
Capitalizing on large pre-trained models for various downstream tasks of interest have
recently emerged with promising performance. Due to the ever-growing model size, the …
recently emerged with promising performance. Due to the ever-growing model size, the …
Uniformer: Unifying convolution and self-attention for visual recognition
It is a challenging task to learn discriminative representation from images and videos, due to
large local redundancy and complex global dependency in these visual data. Convolution …
large local redundancy and complex global dependency in these visual data. Convolution …
Tdn: Temporal difference networks for efficient action recognition
Temporal modeling still remains challenging for action recognition in videos. To mitigate this
issue, this paper presents a new video architecture, termed as Temporal Difference Network …
issue, this paper presents a new video architecture, termed as Temporal Difference Network …
Uniformerv2: Unlocking the potential of image vits for video understanding
The prolific performances of Vision Transformers (ViTs) in image tasks have prompted
research into adapting the image ViTs for video tasks. However, the substantial gap …
research into adapting the image ViTs for video tasks. However, the substantial gap …
Mamba-360: Survey of state space models as transformer alternative for long sequence modelling: Methods, applications, and challenges
Sequence modeling is a crucial area across various domains, including Natural Language
Processing (NLP), speech recognition, time series forecasting, music generation, and …
Processing (NLP), speech recognition, time series forecasting, music generation, and …
Uniformerv2: Spatiotemporal learning by arming image vits with video uniformer
Learning discriminative spatiotemporal representation is the key problem of video
understanding. Recently, Vision Transformers (ViTs) have shown their power in learning …
understanding. Recently, Vision Transformers (ViTs) have shown their power in learning …
A cooperative vehicle-infrastructure system for road hazards detection with edge intelligence
Road hazards (RH) have always been the cause of many serious traffic accidents. These
have posed a threat to the safety of drivers, passengers, and pedestrians, and have also …
have posed a threat to the safety of drivers, passengers, and pedestrians, and have also …
Diversifying spatial-temporal perception for video domain generalization
Video domain generalization aims to learn generalizable video classification models for
unseen target domains by training in a source domain. A critical challenge of video domain …
unseen target domains by training in a source domain. A critical challenge of video domain …