Universal instance perception as object discovery and retrieval
All instance perception tasks aim at finding certain objects specified by some queries such
as category names, language expressions, and target annotations, but this complete field …
as category names, language expressions, and target annotations, but this complete field …
Visual semantic segmentation based on few/zero-shot learning: An overview
Visual semantic segmentation aims at separating a visual sample into diverse blocks with
specific semantic attributes and identifying the category for each block, and it plays a crucial …
specific semantic attributes and identifying the category for each block, and it plays a crucial …
Onlinerefer: A simple online baseline for referring video object segmentation
Referring video object segmentation (RVOS) aims at segmenting an object in a video
following human instruction. Current state-of-the-art methods fall into an offline pattern, in …
following human instruction. Current state-of-the-art methods fall into an offline pattern, in …
Spectrum-guided multi-granularity referring video object segmentation
Current referring video object segmentation (R-VOS) techniques extract conditional kernels
from encoded (low-resolution) vision-language features to segment the decoded high …
from encoded (low-resolution) vision-language features to segment the decoded high …
A comprehensive survey on video saliency detection with auditory information: the audio-visual consistency perceptual is the key!
Video saliency detection (VSD) aims at fast locating the most attractive
objects/things/patterns in a given video clip. Existing VSD-related works have mainly relied …
objects/things/patterns in a given video clip. Existing VSD-related works have mainly relied …
Robust referring video object segmentation with cyclic structural consensus
Abstract Referring Video Object Segmentation (R-VOS) is a challenging task that aims to
segment an object in a video based on a linguistic expression. Most existing R-VOS …
segment an object in a video based on a linguistic expression. Most existing R-VOS …
Local-global context aware transformer for language-guided video segmentation
We explore the task of language-guided video segmentation (LVS). Previous algorithms
mostly adopt 3D CNNs to learn video representation, struggling to capture long-term context …
mostly adopt 3D CNNs to learn video representation, struggling to capture long-term context …
Segment every reference object in spatial and temporal spaces
The reference-based object segmentation tasks, namely referring image segmentation
(RIS), referring video object segmentation (RVOS), and video object segmentation (VOS) …
(RIS), referring video object segmentation (RVOS), and video object segmentation (VOS) …
Self-supervised pretraining for RGB-D salient object detection
Abstract Existing CNNs-Based RGB-D salient object detection (SOD) networks are all
required to be pretrained on the ImageNet to learn the hierarchy features which helps …
required to be pretrained on the ImageNet to learn the hierarchy features which helps …
Decoupling static and hierarchical motion perception for referring video segmentation
Referring video segmentation relies on natural language expressions to identify and
segment objects often emphasizing motion clues. Previous works treat a sentence as a …
segment objects often emphasizing motion clues. Previous works treat a sentence as a …