Google 학술 검색

D Jia, Y Yuan, H He, X Wu, H Yu… - Proceedings of the …, 2023 - openaccess.thecvf.com

One-to-one set matching is a key design for DETR to establish its end-to-end capability, so
that object detection does not require a hand-crafted NMS (non-maximum suppression) to …

저장 인용 220회 인용 관련 학술자료 전체 6개의 버전 HTML 버전

[Free GPT-4]

[PDF] thecvf.com

Motrv2: Bootstrap** end-to-end multi-object tracking by pretrained object detectors

Y Zhang, T Wang, X Zhang - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

In this paper, we propose MOTRv2, a simple yet effective pipeline to bootstrap end-to-end
multi-object tracking with a pretrained object detector. Existing end-to-end methods, eg …

저장 인용 152회 인용 관련 학술자료 전체 5개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

A simple single-scale vision transformer for object localization and instance segmentation

W Chen, X Du, F Yang, L Beyer, X Zhai, TY Lin… - arxiv preprint arxiv …, 2021 - arxiv.org

This work presents a simple vision transformer design as a strong baseline for object
localization and instance segmentation tasks. Transformers recently demonstrate …

저장 인용 224회 인용 관련 학술자료 전체 5개의 버전 HTML 버전

[Free GPT-4]

[PDF] thecvf.com

Language as queries for referring video object segmentation

J Wu, Y Jiang, P Sun, Z Yuan… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

Referring video object segmentation (R-VOS) is an emerging cross-modal task that aims to
segment the target object referred by a language expression in all video frames. In this work …

저장 인용 166회 인용 관련 학술자료 전체 7개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

End-to-end temporal action detection with transformer

X Liu, Q Wang, Y Hu, X Tang, S Zhang… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Temporal action detection (TAD) aims to determine the semantic label and the temporal
interval of every action instance in an untrimmed video. It is a fundamental and challenging …

저장 인용 266회 인용 관련 학술자료 전체 5개의 버전

[Free GPT-4]

[PDF] neurips.cc

Minvis: A minimal video instance segmentation framework without video-based training

DA Huang, Z Yu, A Anandkumar - Advances in Neural …, 2022 - proceedings.neurips.cc

We propose MinVIS, a minimal video instance segmentation (VIS) framework that achieves
state-of-the-art VIS performance with neither video-based architectures nor training …

저장 인용 89회 인용 관련 학술자료 전체 6개의 버전 HTML 버전

[Free GPT-4]

[PDF] neurips.cc

Vita: Video instance segmentation via object token association

M Heo, S Hwang, SW Oh, JY Lee… - Advances in Neural …, 2022 - proceedings.neurips.cc

We introduce a novel paradigm for offline Video Instance Segmentation (VIS), based on the
hypothesis that explicit object-oriented information can be a strong clue for understanding …

저장 인용 95회 인용 관련 학술자료 전체 8개의 버전 HTML 버전

[Free GPT-4]

[PDF] thecvf.com

Html: Hybrid temporal-scale multimodal learning framework for referring video object segmentation

M Han, Y Wang, Z Li, L Yao… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract Referring Video Object Segmentation (RVOS) is to segment the object instance
from a given video, according to the textual description of this object. However, in the open …

저장 인용 27회 인용 관련 학술자료 전체 6개의 버전 HTML 버전

[Free GPT-4]

[PDF] thecvf.com

Temporal collection and distribution for referring video object segmentation

J Tang, G Zheng, S Yang - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

Referring video object segmentation aims to segment a referent throughout a video
sequence according to a natural language expression. It requires aligning the natural …

저장 인용 22회 인용 관련 학술자료 전체 5개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Visa: Reasoning video object segmentation via large language models

C Yan, H Wang, S Yan, X Jiang, Y Hu, G Kang… - … on Computer Vision, 2024 - Springer

Abstract Existing Video Object Segmentation (VOS) relies on explicit user instructions, such
as categories, masks, or short phrases, restricting their ability to perform complex video …

저장 인용 14회 인용 관련 학술자료 전체 7개의 버전

알림 만들기

인용

고급 검색

라이브러리에 저장됨

Seqformer: a frustratingly simple model for video instance segmentation

Detrs with hybrid matching

Motrv2: Bootstrap** end-to-end multi-object tracking by pretrained object detectors

A simple single-scale vision transformer for object localization and instance segmentation

Language as queries for referring video object segmentation

End-to-end temporal action detection with transformer

Minvis: A minimal video instance segmentation framework without video-based training

Vita: Video instance segmentation via object token association

Html: Hybrid temporal-scale multimodal learning framework for referring video object segmentation

Temporal collection and distribution for referring video object segmentation

Visa: Reasoning video object segmentation via large language models