New generation deep learning for video object detection: A survey
Video object detection, a basic task in the computer vision field, is rapidly evolving and
widely used. In recent years, deep learning methods have rapidly become widespread in the …
widely used. In recent years, deep learning methods have rapidly become widespread in the …
A review of video object detection: Datasets, metrics and methods
Although there are well established object detection methods based on static images, their
application to video data on a frame by frame basis faces two shortcomings:(i) lack of …
application to video data on a frame by frame basis faces two shortcomings:(i) lack of …
Bevdet4d: Exploit temporal cues in multi-camera 3d object detection
J Huang, G Huang - arxiv preprint arxiv:2203.17054, 2022 - arxiv.org
Single frame data contains finite information which limits the performance of the existing
vision-based multi-camera 3D object detection paradigms. For fundamentally pushing the …
vision-based multi-camera 3D object detection paradigms. For fundamentally pushing the …
Transflow: Transformer as flow learner
Optical flow is an indispensable building block for various important computer vision tasks,
including motion estimation, object tracking, and disparity measurement. In this work, we …
including motion estimation, object tracking, and disparity measurement. In this work, we …
Tf-blender: Temporal feature blender for video object detection
Video objection detection is a challenging task because isolated video frames may
encounter appearance deterioration, which introduces great confusion for detection. One of …
encounter appearance deterioration, which introduces great confusion for detection. One of …
Ts-cam: Token semantic coupled attention map for weakly supervised object localization
Weakly supervised object localization (WSOL) is a challenging problem when given image
category labels but requires to learn object localization models. Optimizing a convolutional …
category labels but requires to learn object localization models. Optimizing a convolutional …
Disentangled non-local neural networks
The non-local block is a popular module for strengthening the context modeling ability of a
regular convolutional neural network. This paper first studies the non-local block in depth …
regular convolutional neural network. This paper first studies the non-local block in depth …
Memory enhanced global-local aggregation for video object detection
How do humans recognize an object in a piece of video? Due to the deteriorated quality of
single frame, it may be hard for people to identify an occluded object in this frame by just …
single frame, it may be hard for people to identify an occluded object in this frame by just …
Tube-Link: A flexible cross tube framework for universal video segmentation
Video segmentation aims to segment and track every pixel in diverse scenarios accurately.
In this paper, we present Tube-Link, a versatile framework that addresses multiple core tasks …
In this paper, we present Tube-Link, a versatile framework that addresses multiple core tasks …
TransVOD: end-to-end video object detection with spatial-temporal transformers
Detection Transformer (DETR) and Deformable DETR have been proposed to eliminate the
need for many hand-designed components in object detection while demonstrating good …
need for many hand-designed components in object detection while demonstrating good …