Test-time training with masked autoencoders
Test-time training adapts to a new test distribution on the fly by optimizing a model for each
test input using self-supervision. In this paper, we use masked autoencoders for this one …
test input using self-supervision. In this paper, we use masked autoencoders for this one …
Ttt++: When does self-supervised test-time training fail or thrive?
Test-time training (TTT) through self-supervised learning (SSL) is an emerging paradigm to
tackle distributional shifts. Despite encouraging results, it remains unclear when this …
tackle distributional shifts. Despite encouraging results, it remains unclear when this …
Minvis: A minimal video instance segmentation framework without video-based training
We propose MinVIS, a minimal video instance segmentation (VIS) framework that achieves
state-of-the-art VIS performance with neither video-based architectures nor training …
state-of-the-art VIS performance with neither video-based architectures nor training …
A survey on deep learning technique for video segmentation
Video segmentation—partitioning video frames into multiple segments or objects—plays a
critical role in a broad range of practical applications, from enhancing visual effects in movie …
critical role in a broad range of practical applications, from enhancing visual effects in movie …
Do different tracking tasks require different appearance models?
Tracking objects of interest in a video is one of the most popular and widely applicable
problems in computer vision. However, with the years, a Cambrian explosion of use cases …
problems in computer vision. However, with the years, a Cambrian explosion of use cases …
Dynamically instance-guided adaptation: A backward-free approach for test-time domain adaptive semantic segmentation
In this paper, we study the application of Test-time domain adaptation in semantic
segmentation (TTDA-Seg) where both efficiency and effectiveness are crucial. Existing …
segmentation (TTDA-Seg) where both efficiency and effectiveness are crucial. Existing …
Mask-free video instance segmentation
The recent advancement in Video Instance Segmentation (VIS) has largely been driven by
the use of deeper and increasingly data-hungry transformer-based models. However, video …
the use of deeper and increasingly data-hungry transformer-based models. However, video …
End-to-end 3d tracking with decoupled queries
In this work, we present an end-to-end framework for camera-based 3D multi-object tracking,
called DQTrack. To avoid heuristic design in detection-based trackers, recent query-based …
called DQTrack. To avoid heuristic design in detection-based trackers, recent query-based …
A gated attention transformer for multi-person pose tracking
Multi-person pose tracking is an important element for many applications and requires to
estimate the human poses of all persons in a video and to track them over time. The …
estimate the human poses of all persons in a video and to track them over time. The …
What is Point Supervision Worth in Video Instance Segmentation?
Video instance segmentation (VIS) is a challenging vision task that aims to detect segment
and track objects in videos. Conventional VIS methods rely on densely annotated object …
and track objects in videos. Conventional VIS methods rely on densely annotated object …