Universal instance perception as object discovery and retrieval
All instance perception tasks aim at finding certain objects specified by some queries such
as category names, language expressions, and target annotations, but this complete field …
as category names, language expressions, and target annotations, but this complete field …
Transformer-based visual segmentation: A survey
Visual segmentation seeks to partition images, video frames, or point clouds into multiple
segments or groups. This technique has numerous real-world applications, such as …
segments or groups. This technique has numerous real-world applications, such as …
MeViS: A large-scale benchmark for video segmentation with motion expressions
This paper strives for motion expressions guided video segmentation, which focuses on
segmenting objects in video content based on a sentence describing the motion of the …
segmenting objects in video content based on a sentence describing the motion of the …
General object foundation model for images and videos at scale
We present GLEE in this work an object-level foundation model for locating and identifying
objects in images and videos. Through a unified framework GLEEaccomplishes detection …
objects in images and videos. Through a unified framework GLEEaccomplishes detection …
Decoupling static and hierarchical motion perception for referring video segmentation
Referring video segmentation relies on natural language expressions to identify and
segment objects often emphasizing motion clues. Previous works treat a sentence as a …
segment objects often emphasizing motion clues. Previous works treat a sentence as a …
Tube-link: A flexible cross tube framework for universal video segmentation
Video segmentation aims to segment and track every pixel in diverse scenarios accurately.
In this paper, we present Tube-Link, a versatile framework that addresses multiple core tasks …
In this paper, we present Tube-Link, a versatile framework that addresses multiple core tasks …
Dvis: Decoupled video instance segmentation framework
Video instance segmentation (VIS) is a critical task with diverse applications, including
autonomous driving and video editing. Existing methods often underperform on complex …
autonomous driving and video editing. Existing methods often underperform on complex …
Spectrum-guided multi-granularity referring video object segmentation
Current referring video object segmentation (R-VOS) techniques extract conditional kernels
from encoded (low-resolution) vision-language features to segment the decoded high …
from encoded (low-resolution) vision-language features to segment the decoded high …
Ctvis: Consistent training for online video instance segmentation
The discrimination of instance embeddings plays a vital role in associating instances across
time for online video instance segmentation (VIS). Instance embedding learning is directly …
time for online video instance segmentation (VIS). Instance embedding learning is directly …
Soc: Semantic-assisted object cluster for referring video object segmentation
This paper studies referring video object segmentation (RVOS) by boosting video-level
visual-linguistic alignment. Recent approaches model the RVOS task as a sequence …
visual-linguistic alignment. Recent approaches model the RVOS task as a sequence …