Object detection using deep learning methods in traffic scenarios

A Boukerche, Z Hou - ACM Computing Surveys (CSUR), 2021 - dl.acm.org
The recent boom of autonomous driving nowadays has made object detection in traffic
scenes a hot topic of research. Designed to classify and locate instances in the image, this is …

Action recognition based on RGB and skeleton data sets: A survey

R Yue, Z Tian, S Du - Neurocomputing, 2022 - Elsevier
Action recognition is a major branch of computer vision research. As a widely used
technology, action recognition has been applied to human–computer interaction, intelligent …

Videocomposer: Compositional video synthesis with motion controllability

X Wang, H Yuan, S Zhang, D Chen… - Advances in …, 2024 - proceedings.neurips.cc
The pursuit of controllability as a higher standard of visual content creation has yielded
remarkable progress in customizable image synthesis. However, achieving controllable …

Memvit: Memory-augmented multiscale vision transformer for efficient long-term video recognition

CY Wu, Y Li, K Mangalam, H Fan… - Proceedings of the …, 2022 - openaccess.thecvf.com
While today's video recognition systems parse snapshots or short clips accurately, they
cannot connect the dots and reason across a longer range of time yet. Most existing video …

X3d: Expanding architectures for efficient video recognition

C Feichtenhofer - Proceedings of the IEEE/CVF conference …, 2020 - openaccess.thecvf.com
This paper presents X3D, a family of efficient video networks that progressively expand a
tiny 2D image classification architecture along multiple network axes, in space, time, width …

Slowfast networks for video recognition

C Feichtenhofer, H Fan, J Malik… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
We present SlowFast networks for video recognition. Our model involves (i) a Slow pathway,
operating at low frame rate, to capture spatial semantics, and (ii) a Fast pathway, operating …

Tcgl: Temporal contrastive graph for self-supervised video representation learning

Y Liu, K Wang, L Liu, H Lan, L Lin - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Video self-supervised learning is a challenging task, which requires significant expressive
power from the model to leverage rich spatial-temporal knowledge and generate effective …

Learning in the frequency domain

K Xu, M Qin, F Sun, Y Wang… - Proceedings of the …, 2020 - openaccess.thecvf.com
Deep neural networks have achieved remarkable success in computer vision tasks. Existing
neural networks mainly operate in the spatial domain with fixed input sizes. For practical …

Dvc: An end-to-end deep video compression framework

G Lu, W Ouyang, D Xu, X Zhang… - Proceedings of the …, 2019 - openaccess.thecvf.com
Conventional video compression approaches use the predictive coding architecture and
encode the corresponding motion information and residual information. In this paper, taking …

Long-term feature banks for detailed video understanding

CY Wu, C Feichtenhofer, H Fan, K He… - Proceedings of the …, 2019 - openaccess.thecvf.com
To understand the world, we humans constantly need to relate the present to the past, and
put events in context. In this paper, we enable existing video models to do the same. We …