Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Sam 2: Segment anything in images and videos
We present Segment Anything Model 2 (SAM 2), a foundation model towards solving
promptable visual segmentation in images and videos. We build a data engine, which …
promptable visual segmentation in images and videos. We build a data engine, which …
Transformer-based visual segmentation: A survey
Visual segmentation seeks to partition images, video frames, or point clouds into multiple
segments or groups. This technique has numerous real-world applications, such as …
segments or groups. This technique has numerous real-world applications, such as …
Putting the object back into video object segmentation
We present Cutie a video object segmentation (VOS) network with object-level memory
reading which puts the object representation from memory back into the video object …
reading which puts the object representation from memory back into the video object …
Omnitokenizer: A joint image-video tokenizer for visual generation
Tokenizer, serving as a translator to map the intricate visual data into a compact latent
space, lies at the core of visual generative models. Based on the finding that existing …
space, lies at the core of visual generative models. Based on the finding that existing …
Omnivid: A generative framework for universal video understanding
The core of video understanding tasks such as recognition captioning and tracking is to
automatically detect objects or actions in a video and analyze their temporal evolution …
automatically detect objects or actions in a video and analyze their temporal evolution …
Vitron: A unified pixel-level vision llm for understanding, generating, segmenting, editing
Recent developments of vision large language models (LLMs) have seen remarkable
progress, yet still encounter challenges towards multimodal generalists, such as coarse …
progress, yet still encounter challenges towards multimodal generalists, such as coarse …
Chatvideo: A tracklet-centric multimodal and versatile video understanding system
Existing deep video models are limited by specific tasks, fixed input-output spaces, and poor
generalization capabilities, making it difficult to deploy them in real-world scenarios. In this …
generalization capabilities, making it difficult to deploy them in real-world scenarios. In this …
Time does tell: Self-supervised time-tuning of dense image representations
Spatially dense self-supervised learning is a rapidly growing problem domain with
promising applications for unsupervised segmentation and pretraining for dense …
promising applications for unsupervised segmentation and pretraining for dense …
Exploring pre-trained text-to-video diffusion models for referring video object segmentation
In this paper, we explore the visual representations produced from a pre-trained text-to-
video (T2V) diffusion model for video understanding tasks. We hypothesize that the latent …
video (T2V) diffusion model for video understanding tasks. We hypothesize that the latent …
Joint modeling of feature, correspondence, and a compressed memory for video object segmentation
Current prevailing Video Object Segmentation (VOS) methods usually perform dense
matching between the current and reference frames after extracting their features. One on …
matching between the current and reference frames after extracting their features. One on …