Deep learning for video object segmentation: a review
As one of the fundamental problems in the field of video understanding, video object
segmentation aims at segmenting objects of interest throughout the given video sequence …
segmentation aims at segmenting objects of interest throughout the given video sequence …
Sam 2: Segment anything in images and videos
We present Segment Anything Model 2 (SAM 2), a foundation model towards solving
promptable visual segmentation in images and videos. We build a data engine, which …
promptable visual segmentation in images and videos. We build a data engine, which …
Universal instance perception as object discovery and retrieval
All instance perception tasks aim at finding certain objects specified by some queries such
as category names, language expressions, and target annotations, but this complete field …
as category names, language expressions, and target annotations, but this complete field …
Xmem: Long-term video object segmentation with an atkinson-shiffrin memory model
We present XMem, a video object segmentation architecture for long videos with unified
feature memory stores inspired by the Atkinson-Shiffrin memory model. Prior work on video …
feature memory stores inspired by the Atkinson-Shiffrin memory model. Prior work on video …
Segment and track anything
This report presents a framework called Segment And Track Anything (SAMTrack) that
allows users to precisely and effectively segment and track any object in a video …
allows users to precisely and effectively segment and track any object in a video …
MOSE: A new dataset for video object segmentation in complex scenes
Video object segmentation (VOS) aims at segmenting a particular object throughout the
entire video clip sequence. The state-of-the-art VOS methods have achieved excellent …
entire video clip sequence. The state-of-the-art VOS methods have achieved excellent …
Visual semantic segmentation based on few/zero-shot learning: An overview
Visual semantic segmentation aims at separating a visual sample into diverse blocks with
specific semantic attributes and identifying the category for each block, and it plays a crucial …
specific semantic attributes and identifying the category for each block, and it plays a crucial …
Lavt: Language-aware vision transformer for referring image segmentation
Referring image segmentation is a fundamental vision-language task that aims to segment
out an object referred to by a natural language expression from an image. One of the key …
out an object referred to by a natural language expression from an image. One of the key …
Dropmae: Masked autoencoders with spatial-attention dropout for tracking tasks
In this paper, we study masked autoencoder (MAE) pretraining on videos for matching-
based downstream tasks, including visual object tracking (VOT) and video object …
based downstream tasks, including visual object tracking (VOT) and video object …
Towards grand unification of object tracking
We present a unified method, termed Unicorn, that can simultaneously solve four tracking
problems (SOT, MOT, VOS, MOTS) with a single network using the same model parameters …
problems (SOT, MOT, VOS, MOTS) with a single network using the same model parameters …