Sam 2: Segment anything in images and videos

N Ravi, V Gabeur, YT Hu, R Hu, C Ryali, T Ma… - arxiv preprint arxiv …, 2024 - arxiv.org
We present Segment Anything Model 2 (SAM 2), a foundation model towards solving
promptable visual segmentation in images and videos. We build a data engine, which …

Learning object state changes in videos: An open-world perspective

Z Xue, K Ashutosh, K Grauman - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Abstract Object State Changes (OSCs) are pivotal for video understanding. While humans
can effortlessly generalize OSC understanding from familiar to unknown objects current …

An outlook into the future of egocentric vision

C Plizzari, G Goletto, A Furnari, S Bansal… - International Journal of …, 2024 - Springer
What will the future be? We wonder! In this survey, we explore the gap between current
research in egocentric vision and the ever-anticipated future, where wearable computing …

Video state-changing object segmentation

J Yu, X Li, X Zhao, H Zhang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Daily objects commonly experience state changes. For example, slicing a cucumber
changes its state from whole to sliced. Learning about object state changes in Video Object …

Understanding Video Transformers via Universal Concept Discovery

M Kowal, A Dave, R Ambrus… - Proceedings of the …, 2024 - openaccess.thecvf.com
This paper studies the problem of concept-based interpretability of transformer
representations for videos. Concretely we seek to explain the decision-making process of …

Point-VOS: Pointing Up Video Object Segmentation

S Mahadevan, IE Zulfikar… - Proceedings of the …, 2024 - openaccess.thecvf.com
Current state-of-the-art Video Object Segmentation (VOS) methods rely on dense per-object
mask annotations both during training and testing. This requires time-consuming and costly …

Sam2long: Enhancing sam 2 for long video segmentation with a training-free memory tree

S Ding, R Qian, X Dong, P Zhang, Y Zang… - arxiv preprint arxiv …, 2024 - arxiv.org
The Segment Anything Model 2 (SAM 2) has emerged as a powerful foundation model for
object segmentation in both images and videos, paving the way for various downstream …

Learning to Segment Referred Objects from Narrated Egocentric Videos

Y Shen, H Wang, X Yang, M Feiszli… - Proceedings of the …, 2024 - openaccess.thecvf.com
Egocentric videos provide a first-person perspective of the wearer's activities involving
simultaneous interactions with multiple objects. In this work we propose the task of weakly …

RMem: Restricted Memory Banks Improve Video Object Segmentation

J Zhou, Z Pang, YX Wang - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
With recent video object segmentation (VOS) benchmarks evolving to challenging scenarios
we revisit a simple but overlooked strategy: restricting the size of memory banks. This …

Actionvos: Actions as prompts for video object segmentation

L Ouyang, R Liu, Y Huang, R Furuta, Y Sato - European Conference on …, 2024 - Springer
Delving into the realm of egocentric vision, the advancement of referring video object
segmentation (RVOS) stands as pivotal in understanding human activities. However …