Lotus: Diffusion-based visual foundation model for high-quality dense prediction
Leveraging the visual priors of pre-trained text-to-image diffusion models offers a promising
solution to enhance zero-shot generalization in dense prediction tasks. However, existing …
solution to enhance zero-shot generalization in dense prediction tasks. However, existing …
Taptrv2: Attention-based position update improves tracking any point
In this paper, we present TAPTRv2, a Transformer-based approach built upon TAPTR for
solving the Tracking Any Point (TAP) task. TAPTR borrows designs from DEtection …
solving the Tracking Any Point (TAP) task. TAPTR borrows designs from DEtection …
Robo-gs: A physics consistent spatial-temporal model for robotic arm with hybrid representation
Real2Sim2Real plays a critical role in robotic arm control and reinforcement learning, yet
bridging this gap remains a significant challenge due to the complex physical properties of …
bridging this gap remains a significant challenge due to the complex physical properties of …
Align3R: Aligned Monocular Depth Estimation for Dynamic Videos
Recent developments in monocular depth estimation methods enable high-quality depth
estimation of single-view images but fail to estimate consistent video depth across different …
estimation of single-view images but fail to estimate consistent video depth across different …
ObjCtrl-2.5 D: Training-free Object Control with Camera Poses
This study aims to achieve more precise and versatile object control in image-to-video (I2V)
generation. Current methods typically represent the spatial movement of target objects with …
generation. Current methods typically represent the spatial movement of target objects with …
Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation
While recent foundational video generators produce visually rich output, they still struggle
with appearance drift, where objects gradually degrade or change inconsistently across …
with appearance drift, where objects gradually degrade or change inconsistently across …
ProTracker: Probabilistic Integration for Robust and Accurate Point Tracking
In this paper, we propose ProTracker, a novel framework for robust and accurate long-term
dense tracking of arbitrary points in videos. The key idea of our method is incorporating …
dense tracking of arbitrary points in videos. The key idea of our method is incorporating …
Trajectory-aligned Space-time Tokens for Few-shot Action Recognition
We propose a simple yet effective approach for few-shot action recognition, emphasizing the
disentanglement of motion and appearance representations. By harnessing recent progress …
disentanglement of motion and appearance representations. By harnessing recent progress …
Towards Robust Automation of Surgical Systems via Digital Twin-based Scene Representations from Foundation Models
Large language model-based (LLM) agents are emerging as a powerful enabler of robust
embodied intelligence due to their capability of planning complex action sequences. Sound …
embodied intelligence due to their capability of planning complex action sequences. Sound …
Hybrid Cost Volume for Memory-Efficient Optical Flow
Current state-of-the-art flow methods are mostly based on dense all-pairs cost volumes.
However, as image resolution increases, the computational and spatial complexity of …
However, as image resolution increases, the computational and spatial complexity of …