Foundation Models Defining a New Era in Vision: a Survey and Outlook
Vision systems that see and reason about the compositional nature of visual scenes are
fundamental to understanding our world. The complex relations between objects and their …
fundamental to understanding our world. The complex relations between objects and their …
Video object segmentation and tracking: A survey
Object segmentation and object tracking are fundamental research areas in the computer
vision community. These two topics are difficult to handle some common challenges, such …
vision community. These two topics are difficult to handle some common challenges, such …
Universal instance perception as object discovery and retrieval
All instance perception tasks aim at finding certain objects specified by some queries such
as category names, language expressions, and target annotations, but this complete field …
as category names, language expressions, and target annotations, but this complete field …
Tracking anything with decoupled video segmentation
Training data for video segmentation are expensive to annotate. This impedes extensions of
end-to-end algorithms to new video segmentation tasks, especially in large-vocabulary …
end-to-end algorithms to new video segmentation tasks, especially in large-vocabulary …
Vrt: A video restoration transformer
Video restoration aims to restore high-quality frames from low-quality frames. Different from
single image restoration, video restoration generally requires to utilize temporal information …
single image restoration, video restoration generally requires to utilize temporal information …
MeViS: A large-scale benchmark for video segmentation with motion expressions
This paper strives for motion expressions guided video segmentation, which focuses on
segmenting objects in video content based on a sentence describing the motion of the …
segmenting objects in video content based on a sentence describing the motion of the …
Omg-llava: Bridging image-level, object-level, pixel-level reasoning and understanding
Current universal segmentation methods demonstrate strong capabilities in pixel-level
image and video understanding. However, they lack reasoning abilities and cannot be …
image and video understanding. However, they lack reasoning abilities and cannot be …
Recurrent video restoration transformer with guided deformable attention
Video restoration aims at restoring multiple high-quality frames from multiple low-quality
frames. Existing video restoration methods generally fall into two extreme cases, ie, they …
frames. Existing video restoration methods generally fall into two extreme cases, ie, they …
Referring multi-object tracking
Existing referring understanding tasks tend to involve the detection of a single text-referred
object. In this paper, we propose a new and general referring understanding task, termed …
object. In this paper, we propose a new and general referring understanding task, termed …
Matting anything
In this paper we propose the Matting Anything Model (MAM) an efficient and versatile
framework for estimating the alpha matte of any instance in an image with flexible and …
framework for estimating the alpha matte of any instance in an image with flexible and …