Track to detect and segment: An online multi-object tracker

J Wu, J Cao, L Song, Y Wang… - Proceedings of the …, 2021 - openaccess.thecvf.com
Most online multi-object trackers perform object detection stand-alone in a neural net without
any input from tracking. In this paper, we present a new online joint detection and tracking …

Human activity recognition (har) using deep learning: Review, methodologies, progress and future research directions

P Kumar, S Chauhan, LK Awasthi - Archives of Computational Methods in …, 2024 - Springer
Human activity recognition is essential in many domains, including the medical and smart
home sectors. Using deep learning, we conduct a comprehensive survey of current state …

Vpfnet: Improving 3d object detection with virtual point based lidar and stereo data fusion

H Zhu, J Deng, Y Zhang, J Ji, Q Mao… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
It has been well recognized that fusing the complementary information from depth-aware
LiDAR point clouds and semantic-rich stereo images would benefit 3D object detection …

Temporal-channel transformer for 3d lidar-based video object detection for autonomous driving

Z Yuan, X Song, L Bai, Z Wang… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
The strong demand of autonomous driving in the industry has led to vigorous interest in 3D
object detection and resulted in many excellent 3D object detection algorithms. However …

Implicit temporal modeling with learnable alignment for video recognition

S Tu, Q Dai, Z Wu, ZQ Cheng, H Hu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Contrastive language-image pretraining (CLIP) has demonstrated remarkable success in
various image tasks. However, how to extend CLIP with effective temporal modeling is still …

Dgrnet: A dual-level graph relation network for video object detection

Q Qi, T Hou, Y Lu, Y Yan… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Video object detection is a fundamental and important task in computer vision. One mainstay
solution for this task is to aggregate features from different frames to enhance the detection …