Google 학술 검색

N Tumanyan, A Singer, S Bagon, T Dekel - European Conference on …, 2024 - Springer

We present DINO-Tracker–a new framework for long-term dense tracking in video. The pillar
of our approach is combining test-time training on a single video, with the powerful localized …

저장 인용 27회 인용 관련 학술자료 전체 6개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Ram: Retrieval-based affordance transfer for generalizable zero-shot robotic manipulation

Y Kuang, J Ye, H Geng, J Mao, C Deng… - arxiv preprint arxiv …, 2024 - arxiv.org

This work proposes a retrieve-and-transfer framework for zero-shot robotic manipulation,
dubbed RAM, featuring generalizability across various objects, environments, and …

저장 인용 13회 인용 관련 학술자료 전체 5개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Diffusion models and representation learning: A survey

M Fuest, P Ma, M Gui, JS Fischer, VT Hu… - arxiv preprint arxiv …, 2024 - arxiv.org

Diffusion Models are popular generative modeling methods in various vision tasks, attracting
significant attention. They can be considered a unique instance of self-supervised learning …

저장 인용 12회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Improving semantic correspondence with viewpoint-guided spherical maps

O Mariotti, O Mac Aodha… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Recent self-supervised models produce visual features that are not only effective at
encoding image-level but also pixel-level semantics. They have been reported to obtain …

저장 인용 10회 인용 관련 학술자료 전체 8개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Can Visual Foundation Models Achieve Long-term Point Tracking?

G Aydemir, W **e, F Güney - arxiv preprint arxiv:2408.13575, 2024 - arxiv.org

Large-scale vision foundation models have demonstrated remarkable success across
various tasks, underscoring their robust generalization capabilities. While their proficiency in …

저장 인용 5회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Law of vision representation in mllms

S Yang, B Zhai, Q You, J Yuan, H Yang… - arxiv preprint arxiv …, 2024 - arxiv.org

We present the" Law of Vision Representation" in multimodal large language models
(MLLMs). It reveals a strong correlation between the combination of cross-modal alignment …

저장 인용 2회 인용 관련 학술자료 전체 4개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Toward a holistic evaluation of robustness in clip models

W Tu, W Deng, T Gedeon - arxiv preprint arxiv:2410.01534, 2024 - arxiv.org

Contrastive Language-Image Pre-training (CLIP) models have shown significant potential,
particularly in zero-shot classification across diverse distribution shifts. Building on existing …

저장 인용 2회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Click to grasp: Zero-shot precise manipulation via visual diffusion descriptors

N Tsagkas, J Rome, S Ramamoorthy… - 2024 IEEE/RSJ …, 2024 - ieeexplore.ieee.org

Precise manipulation that is generalizable across scenes and objects remains a persistent
challenge in robotics. Current approaches for this task heavily depend on having a …

저장 인용 2회 인용 관련 학술자료 전체 6개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

CleanDIFT: Diffusion Features without Noise

N Stracke, SA Baumann, K Bauer, F Fundel… - arxiv preprint arxiv …, 2024 - arxiv.org

Internal features from large-scale pre-trained diffusion models have recently been
established as powerful semantic descriptors for a wide range of downstream tasks. Works …

저장 인용 1회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

ProTracker: Probabilistic Integration for Robust and Accurate Point Tracking

T Zhang, C Wang, Z Dou, Q Gao, J Lei, B Chen… - arxiv preprint arxiv …, 2025 - arxiv.org

In this paper, we propose ProTracker, a novel framework for robust and accurate long-term
dense tracking of arbitrary points in videos. The key idea of our method is incorporating …

저장 인용 1회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

알림 만들기

인용

고급 검색

라이브러리에 저장됨

Telling left from right: Identifying geometry-aware semantic correspondence

Dino-tracker: Taming dino for self-supervised point tracking in a single video

Ram: Retrieval-based affordance transfer for generalizable zero-shot robotic manipulation

Diffusion models and representation learning: A survey

Improving semantic correspondence with viewpoint-guided spherical maps

Can Visual Foundation Models Achieve Long-term Point Tracking?

Law of vision representation in mllms

Toward a holistic evaluation of robustness in clip models

Click to grasp: Zero-shot precise manipulation via visual diffusion descriptors

CleanDIFT: Diffusion Features without Noise

ProTracker: Probabilistic Integration for Robust and Accurate Point Tracking