Attention mechanisms in computer vision: A survey

MH Guo, TX Xu, JJ Liu, ZN Liu, PT Jiang, TJ Mu… - Computational visual …, 2022 - Springer
Humans can naturally and effectively find salient regions in complex scenes. Motivated by
this observation, attention mechanisms were introduced into computer vision with the aim of …

Human action recognition from various data modalities: A review

Z Sun, Q Ke, H Rahmani, M Bennamoun… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Human Action Recognition (HAR) aims to understand human behavior and assign a label to
each action. It has a wide range of applications, and therefore has been attracting increasing …

Mvitv2: Improved multiscale vision transformers for classification and detection

Y Li, CY Wu, H Fan, K Mangalam… - Proceedings of the …, 2022 - openaccess.thecvf.com
In this paper, we study Multiscale Vision Transformers (MViTv2) as a unified architecture for
image and video classification, as well as object detection. We present an improved version …

Multiscale vision transformers

H Fan, B **ong, K Mangalam, Y Li… - Proceedings of the …, 2021 - openaccess.thecvf.com
Abstract We present Multiscale Vision Transformers (MViT) for video and image recognition,
by connecting the seminal idea of multiscale feature hierarchies with transformer models …

Memvit: Memory-augmented multiscale vision transformer for efficient long-term video recognition

CY Wu, Y Li, K Mangalam, H Fan… - Proceedings of the …, 2022 - openaccess.thecvf.com
While today's video recognition systems parse snapshots or short clips accurately, they
cannot connect the dots and reason across a longer range of time yet. Most existing video …

A review on the long short-term memory model

G Van Houdt, C Mosquera, G Nápoles - Artificial Intelligence Review, 2020 - Springer
Long short-term memory (LSTM) has transformed both machine learning and
neurocomputing fields. According to several online sources, this model has improved …

X3d: Expanding architectures for efficient video recognition

C Feichtenhofer - Proceedings of the IEEE/CVF conference …, 2020 - openaccess.thecvf.com
This paper presents X3D, a family of efficient video networks that progressively expand a
tiny 2D image classification architecture along multiple network axes, in space, time, width …

Movinets: Mobile video networks for efficient video recognition

D Kondratyuk, L Yuan, Y Li, L Zhang… - Proceedings of the …, 2021 - openaccess.thecvf.com
Abstract We present Mobile Video Networks (MoViNets), a family of computation and
memory efficient video networks that can operate on streaming video for online inference …

Human activity recognition (har) using deep learning: Review, methodologies, progress and future research directions

P Kumar, S Chauhan, LK Awasthi - Archives of Computational Methods in …, 2024 - Springer
Human activity recognition is essential in many domains, including the medical and smart
home sectors. Using deep learning, we conduct a comprehensive survey of current state …

Vision-based human activity recognition: a survey

DR Beddiar, B Nini, M Sabokrou, A Hadid - Multimedia Tools and …, 2020 - Springer
Human activity recognition (HAR) systems attempt to automatically identify and analyze
human activities using acquired information from various types of sensors. Although several …