An overview of violence detection techniques: current challenges and future directions

N Mumtaz, N Ejaz, S Habib, SM Mohsin… - Artificial intelligence …, 2023‏ - Springer
Abstract The Big Video Data generated in today's smart cities has raised concerns from its
purposeful usage perspective, where surveillance cameras, among many others are the …

Uniformer: Unifying convolution and self-attention for visual recognition

K Li, Y Wang, J Zhang, P Gao, G Song… - … on Pattern Analysis …, 2023‏ - ieeexplore.ieee.org
It is a challenging task to learn discriminative representation from images and videos, due to
large local redundancy and complex global dependency in these visual data. Convolution …

Actionclip: A new paradigm for video action recognition

M Wang, J **ng, Y Liu - arxiv preprint arxiv:2109.08472, 2021‏ - arxiv.org
The canonical approach to video action recognition dictates a neural model to do a classic
and standard 1-of-N majority vote task. They are trained to predict a fixed set of predefined …

Tdn: Temporal difference networks for efficient action recognition

L Wang, Z Tong, B Ji, G Wu - Proceedings of the IEEE/CVF …, 2021‏ - openaccess.thecvf.com
Temporal modeling still remains challenging for action recognition in videos. To mitigate this
issue, this paper presents a new video architecture, termed as Temporal Difference Network …

Movinets: Mobile video networks for efficient video recognition

D Kondratyuk, L Yuan, Y Li, L Zhang… - Proceedings of the …, 2021‏ - openaccess.thecvf.com
Abstract We present Mobile Video Networks (MoViNets), a family of computation and
memory efficient video networks that can operate on streaming video for online inference …

Actionclip: Adapting language-image pretrained models for video action recognition

M Wang, J **ng, J Mei, Y Liu… - IEEE Transactions on …, 2023‏ - ieeexplore.ieee.org
The canonical approach to video action recognition dictates a neural network model to do a
classic and standard 1-of-N majority vote task. They are trained to predict a fixed set of …

Long movie clip classification with state-space video models

MM Islam, G Bertasius - European Conference on Computer Vision, 2022‏ - Springer
Most modern video recognition models are designed to operate on short video clips (eg, 5–
10 s in length). Thus, it is challenging to apply such models to long movie understanding …

Stand-alone inter-frame attention in video models

F Long, Z Qiu, Y Pan, T Yao, J Luo… - Proceedings of the …, 2022‏ - openaccess.thecvf.com
Motion, as the uniqueness of a video, has been critical to the development of video
understanding models. Modern deep learning models leverage motion by either executing …

The dawn of quantum natural language processing

R Di Sipio, JH Huang, SYC Chen… - ICASSP 2022-2022 …, 2022‏ - ieeexplore.ieee.org
In this paper, we discuss the initial attempts at boosting understanding human language
based on deep-learning models with quantum computing. We successfully train a quantum …

Motion-driven visual tempo learning for video-based action recognition

Y Liu, J Yuan, Z Tu - IEEE Transactions on Image Processing, 2022‏ - ieeexplore.ieee.org
Action visual tempo characterizes the dynamics and the temporal scale of an action, which is
helpful to distinguish human actions that share high similarities in visual dynamics and …