Sam 2: Segment anything in images and videos
We present Segment Anything Model 2 (SAM 2), a foundation model towards solving
promptable visual segmentation in images and videos. We build a data engine, which …
promptable visual segmentation in images and videos. We build a data engine, which …
Learning object state changes in videos: An open-world perspective
Abstract Object State Changes (OSCs) are pivotal for video understanding. While humans
can effortlessly generalize OSC understanding from familiar to unknown objects current …
can effortlessly generalize OSC understanding from familiar to unknown objects current …
An outlook into the future of egocentric vision
What will the future be? We wonder! In this survey, we explore the gap between current
research in egocentric vision and the ever-anticipated future, where wearable computing …
research in egocentric vision and the ever-anticipated future, where wearable computing …
Video state-changing object segmentation
Daily objects commonly experience state changes. For example, slicing a cucumber
changes its state from whole to sliced. Learning about object state changes in Video Object …
changes its state from whole to sliced. Learning about object state changes in Video Object …
Understanding Video Transformers via Universal Concept Discovery
This paper studies the problem of concept-based interpretability of transformer
representations for videos. Concretely we seek to explain the decision-making process of …
representations for videos. Concretely we seek to explain the decision-making process of …
Point-VOS: Pointing Up Video Object Segmentation
Current state-of-the-art Video Object Segmentation (VOS) methods rely on dense per-object
mask annotations both during training and testing. This requires time-consuming and costly …
mask annotations both during training and testing. This requires time-consuming and costly …
Sam2long: Enhancing sam 2 for long video segmentation with a training-free memory tree
The Segment Anything Model 2 (SAM 2) has emerged as a powerful foundation model for
object segmentation in both images and videos, paving the way for various downstream …
object segmentation in both images and videos, paving the way for various downstream …
Learning to Segment Referred Objects from Narrated Egocentric Videos
Egocentric videos provide a first-person perspective of the wearer's activities involving
simultaneous interactions with multiple objects. In this work we propose the task of weakly …
simultaneous interactions with multiple objects. In this work we propose the task of weakly …
RMem: Restricted Memory Banks Improve Video Object Segmentation
With recent video object segmentation (VOS) benchmarks evolving to challenging scenarios
we revisit a simple but overlooked strategy: restricting the size of memory banks. This …
we revisit a simple but overlooked strategy: restricting the size of memory banks. This …
Actionvos: Actions as prompts for video object segmentation
Delving into the realm of egocentric vision, the advancement of referring video object
segmentation (RVOS) stands as pivotal in understanding human activities. However …
segmentation (RVOS) stands as pivotal in understanding human activities. However …