Detrs with hybrid matching

D Jia, Y Yuan, H He, X Wu, H Yu… - Proceedings of the …, 2023 - openaccess.thecvf.com
One-to-one set matching is a key design for DETR to establish its end-to-end capability, so
that object detection does not require a hand-crafted NMS (non-maximum suppression) to …

Motrv2: Bootstrap** end-to-end multi-object tracking by pretrained object detectors

Y Zhang, T Wang, X Zhang - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
In this paper, we propose MOTRv2, a simple yet effective pipeline to bootstrap end-to-end
multi-object tracking with a pretrained object detector. Existing end-to-end methods, eg …

A simple single-scale vision transformer for object localization and instance segmentation

W Chen, X Du, F Yang, L Beyer, X Zhai, TY Lin… - arxiv preprint arxiv …, 2021 - arxiv.org
This work presents a simple vision transformer design as a strong baseline for object
localization and instance segmentation tasks. Transformers recently demonstrate …

Language as queries for referring video object segmentation

J Wu, Y Jiang, P Sun, Z Yuan… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Referring video object segmentation (R-VOS) is an emerging cross-modal task that aims to
segment the target object referred by a language expression in all video frames. In this work …

End-to-end temporal action detection with transformer

X Liu, Q Wang, Y Hu, X Tang, S Zhang… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Temporal action detection (TAD) aims to determine the semantic label and the temporal
interval of every action instance in an untrimmed video. It is a fundamental and challenging …

Minvis: A minimal video instance segmentation framework without video-based training

DA Huang, Z Yu, A Anandkumar - Advances in Neural …, 2022 - proceedings.neurips.cc
We propose MinVIS, a minimal video instance segmentation (VIS) framework that achieves
state-of-the-art VIS performance with neither video-based architectures nor training …

Vita: Video instance segmentation via object token association

M Heo, S Hwang, SW Oh, JY Lee… - Advances in Neural …, 2022 - proceedings.neurips.cc
We introduce a novel paradigm for offline Video Instance Segmentation (VIS), based on the
hypothesis that explicit object-oriented information can be a strong clue for understanding …

Html: Hybrid temporal-scale multimodal learning framework for referring video object segmentation

M Han, Y Wang, Z Li, L Yao… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Referring Video Object Segmentation (RVOS) is to segment the object instance
from a given video, according to the textual description of this object. However, in the open …

Temporal collection and distribution for referring video object segmentation

J Tang, G Zheng, S Yang - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Referring video object segmentation aims to segment a referent throughout a video
sequence according to a natural language expression. It requires aligning the natural …

Visa: Reasoning video object segmentation via large language models

C Yan, H Wang, S Yan, X Jiang, Y Hu, G Kang… - … on Computer Vision, 2024 - Springer
Abstract Existing Video Object Segmentation (VOS) relies on explicit user instructions, such
as categories, masks, or short phrases, restricting their ability to perform complex video …