- Academic Search

J He, H Li, W Yin, Y Liang, L Li, K Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org

Leveraging the visual priors of pre-trained text-to-image diffusion models offers a promising
solution to enhance zero-shot generalization in dense prediction tasks. However, existing …

Enregistrer Citer Cité 18 fois Autres articles Les 2 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Taptrv2: Attention-based position update improves tracking any point

H Li, H Zhang, S Liu, Z Zeng, F Li, T Ren, B Li… - arxiv preprint arxiv …, 2024 - arxiv.org

In this paper, we present TAPTRv2, a Transformer-based approach built upon TAPTR for
solving the Tracking Any Point (TAP) task. TAPTR borrows designs from DEtection …

Enregistrer Citer Cité 5 fois Autres articles Les 5 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Robo-gs: A physics consistent spatial-temporal model for robotic arm with hybrid representation

H Lou, Y Liu, Y Pan, Y Geng, J Chen, W Ma… - arxiv preprint arxiv …, 2024 - arxiv.org

Real2Sim2Real plays a critical role in robotic arm control and reinforcement learning, yet
bridging this gap remains a significant challenge due to the complex physical properties of …

Enregistrer Citer Cité 5 fois Autres articles Les 3 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Align3R: Aligned Monocular Depth Estimation for Dynamic Videos

J Lu, T Huang, P Li, Z Dou, C Lin, Z Cui, Z Dong… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent developments in monocular depth estimation methods enable high-quality depth
estimation of single-view images but fail to estimate consistent video depth across different …

Enregistrer Citer Cité 2 fois Autres articles Les 2 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

ObjCtrl-2.5 D: Training-free Object Control with Camera Poses

Z Wang, Y Lan, S Zhou, CC Loy - arxiv preprint arxiv:2412.07721, 2024 - arxiv.org

This study aims to achieve more precise and versatile object control in image-to-video (I2V)
generation. Current methods typically represent the spatial movement of target objects with …

Enregistrer Citer Cité 1 fois Autres articles Les 3 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation

H Jeong, CHP Huang, JC Ye, N Mitra… - arxiv preprint arxiv …, 2024 - arxiv.org

While recent foundational video generators produce visually rich output, they still struggle
with appearance drift, where objects gradually degrade or change inconsistently across …

Enregistrer Citer Cité 1 fois Autres articles Les 2 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

ProTracker: Probabilistic Integration for Robust and Accurate Point Tracking

T Zhang, C Wang, Z Dou, Q Gao, J Lei, B Chen… - arxiv preprint arxiv …, 2025 - arxiv.org

In this paper, we propose ProTracker, a novel framework for robust and accurate long-term
dense tracking of arbitrary points in videos. The key idea of our method is incorporating …

Enregistrer Citer Cité 1 fois Autres articles Les 2 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Trajectory-aligned Space-time Tokens for Few-shot Action Recognition

P Kumar, N Padmanabhan, L Luo… - … on Computer Vision, 2024 - Springer

We propose a simple yet effective approach for few-shot action recognition, emphasizing the
disentanglement of motion and appearance representations. By harnessing recent progress …

Enregistrer Citer Autres articles Les 8 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Towards Robust Automation of Surgical Systems via Digital Twin-based Scene Representations from Foundation Models

H Ding, L Seenivasan, H Shu, G Byrd, H Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

Large language model-based (LLM) agents are emerging as a powerful enabler of robust
embodied intelligence due to their capability of planning complex action sequences. Sound …

Enregistrer Citer Cité 1 fois Autres articles Les 3 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Hybrid Cost Volume for Memory-Efficient Optical Flow

Y Zhao, G Xu, G Wu - Proceedings of the 32nd ACM International …, 2024 - dl.acm.org

Current state-of-the-art flow methods are mostly based on dense all-pairs cost volumes.
However, as image resolution increases, the computational and spatial complexity of …

Enregistrer Citer Autres articles Les 4 versions Free GPT-4

Créer l'alerte

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

SpatialTracker: Tracking Any 2D Pixels in 3D Space

Lotus: Diffusion-based visual foundation model for high-quality dense prediction

Taptrv2: Attention-based position update improves tracking any point

Robo-gs: A physics consistent spatial-temporal model for robotic arm with hybrid representation

Align3R: Aligned Monocular Depth Estimation for Dynamic Videos

ObjCtrl-2.5 D: Training-free Object Control with Camera Poses

Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation

ProTracker: Probabilistic Integration for Robust and Accurate Point Tracking

Trajectory-aligned Space-time Tokens for Few-shot Action Recognition

Towards Robust Automation of Surgical Systems via Digital Twin-based Scene Representations from Foundation Models

Hybrid Cost Volume for Memory-Efficient Optical Flow