Foundation Models Defining a New Era in Vision: a Survey and Outlook

M Awais, M Naseer, S Khan, RM Anwer… - … on Pattern Analysis …, 2025 - ieeexplore.ieee.org
Vision systems that see and reason about the compositional nature of visual scenes are
fundamental to understanding our world. The complex relations between objects and their …

Video object segmentation and tracking: A survey

R Yao, G Lin, S **a, J Zhao, Y Zhou - ACM Transactions on Intelligent …, 2020 - dl.acm.org
Object segmentation and object tracking are fundamental research areas in the computer
vision community. These two topics are difficult to handle some common challenges, such …

Universal instance perception as object discovery and retrieval

B Yan, Y Jiang, J Wu, D Wang, P Luo… - Proceedings of the …, 2023 - openaccess.thecvf.com
All instance perception tasks aim at finding certain objects specified by some queries such
as category names, language expressions, and target annotations, but this complete field …

Tracking anything with decoupled video segmentation

HK Cheng, SW Oh, B Price… - Proceedings of the …, 2023 - openaccess.thecvf.com
Training data for video segmentation are expensive to annotate. This impedes extensions of
end-to-end algorithms to new video segmentation tasks, especially in large-vocabulary …

Vrt: A video restoration transformer

J Liang, J Cao, Y Fan, K Zhang… - … on Image Processing, 2024 - ieeexplore.ieee.org
Video restoration aims to restore high-quality frames from low-quality frames. Different from
single image restoration, video restoration generally requires to utilize temporal information …

MeViS: A large-scale benchmark for video segmentation with motion expressions

H Ding, C Liu, S He, X Jiang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
This paper strives for motion expressions guided video segmentation, which focuses on
segmenting objects in video content based on a sentence describing the motion of the …

Omg-llava: Bridging image-level, object-level, pixel-level reasoning and understanding

T Zhang, X Li, H Fei, H Yuan, S Wu… - Advances in …, 2025 - proceedings.neurips.cc
Current universal segmentation methods demonstrate strong capabilities in pixel-level
image and video understanding. However, they lack reasoning abilities and cannot be …

Recurrent video restoration transformer with guided deformable attention

J Liang, Y Fan, X **ang, R Ranjan… - Advances in …, 2022 - proceedings.neurips.cc
Video restoration aims at restoring multiple high-quality frames from multiple low-quality
frames. Existing video restoration methods generally fall into two extreme cases, ie, they …

Referring multi-object tracking

D Wu, W Han, T Wang, X Dong… - Proceedings of the …, 2023 - openaccess.thecvf.com
Existing referring understanding tasks tend to involve the detection of a single text-referred
object. In this paper, we propose a new and general referring understanding task, termed …

Matting anything

J Li, J Jain, H Shi - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
In this paper we propose the Matting Anything Model (MAM) an efficient and versatile
framework for estimating the alpha matte of any instance in an image with flexible and …