Moviechat: From dense token to sparse memory for long video understanding
Recently integrating video foundation models and large language models to build a video
understanding system can overcome the limitations of specific pre-defined vision tasks. Yet …
understanding system can overcome the limitations of specific pre-defined vision tasks. Yet …
Exploring object-centric temporal modeling for efficient multi-view 3d object detection
In this paper, we propose a long-sequence modeling framework, named StreamPETR, for
multi-view 3D object detection. Built upon the sparse query design in the PETR series, we …
multi-view 3D object detection. Built upon the sparse query design in the PETR series, we …
Bytetrack: Multi-object tracking by associating every detection box
Multi-object tracking (MOT) aims at estimating bounding boxes and identities of objects in
videos. Most methods obtain identities by associating detection boxes whose scores are …
videos. Most methods obtain identities by associating detection boxes whose scores are …
Observation-centric sort: Rethinking sort for robust multi-object tracking
Kalman filter (KF) based methods for multi-object tracking (MOT) make an assumption that
objects move linearly. While this assumption is acceptable for very short periods of …
objects move linearly. While this assumption is acceptable for very short periods of …
Motrv2: Bootstrap** end-to-end multi-object tracking by pretrained object detectors
In this paper, we propose MOTRv2, a simple yet effective pipeline to bootstrap end-to-end
multi-object tracking with a pretrained object detector. Existing end-to-end methods, eg …
multi-object tracking with a pretrained object detector. Existing end-to-end methods, eg …
Motiontrack: Learning robust short-term and long-term motions for multi-object tracking
The main challenge of Multi-Object Tracking (MOT) lies in maintaining a continuous
trajectory for each target. Existing methods often learn reliable motion patterns to match the …
trajectory for each target. Existing methods often learn reliable motion patterns to match the …
Unifying short and long-term tracking with graph hierarchies
Tracking objects over long videos effectively means solving a spectrum of problems, from
short-term association for un-occluded objects to long-term association for objects that are …
short-term association for un-occluded objects to long-term association for objects that are …
MeMOTR: Long-term memory-augmented transformer for multi-object tracking
As a video task, Multiple Object Tracking (MOT) is expected to capture temporal information
of targets effectively. Unfortunately, most existing methods only explicitly exploit the object …
of targets effectively. Unfortunately, most existing methods only explicitly exploit the object …
Maptracker: Tracking with strided memory fusion for consistent vector hd map**
This paper presents a vector HD-map** algorithm that formulates the map** as a
tracking task and uses a history of memory latents to ensure consistent reconstructions over …
tracking task and uses a history of memory latents to ensure consistent reconstructions over …
A generalized framework for video instance segmentation
The handling of long videos with complex and occluded sequences has recently emerged
as a new challenge in the video instance segmentation (VIS) community. However, existing …
as a new challenge in the video instance segmentation (VIS) community. However, existing …