- Academic Search

Y Yue, A Das, F Engelmann, S Tang… - European Conference on …, 2024 - Springer

Current visual foundation models are trained purely on unstructured 2D data, limiting their
understanding of 3D structure of objects and scenes. In this work, we show that fine-tuning …

Enregistrer Citer Cité 11 fois Autres articles Les 11 versions Free GPT-4

Stablenormal: Reducing diffusion variance for stable and sharp normal

C Ye, L Qiu, X Gu, Q Zuo, Y Wu, Z Dong, L Bo… - ACM Transactions on …, 2024 - dl.acm.org

This work addresses the challenge of high-quality surface normal estimation from monocular
colored inputs (ie, images and videos), a field which has recently been revolutionized by …

Enregistrer Citer Cité 21 fois Autres articles Les 3 versions Free GPT-4

[Free GPT-4]

[PDF] thecvf.com

Real-time 4k super-resolution of compressed AVIF images. AIS 2024 challenge survey

MV Conde, Z Lei, W Li… - Proceedings of the …, 2024 - openaccess.thecvf.com

This paper introduces a novel benchmark for efficient image upscaling as part of the AIS
2024 Real-Time Image Super-Resolution (RTSR) Challenge which aims to upscale …

Enregistrer Citer Cité 10 fois Autres articles Les 3 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Lift: A surprisingly simple lightweight feature transform for dense vit descriptors

S Suri, M Walmer, K Gupta, A Shrivastava - European Conference on …, 2024 - Springer

We present a simple self-supervised method to enhance the performance of ViT features for
dense downstream tasks. Our Lightweight Feature Transform (LiFT) is a straightforward and …

Enregistrer Citer Cité 3 fois Autres articles Les 4 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Segearth-ov: Towards training-free open-vocabulary segmentation for remote sensing images

K Li, R Liu, X Cao, X Bai, F Zhou, D Meng… - arxiv preprint arxiv …, 2024 - arxiv.org

Remote sensing image plays an irreplaceable role in fields such as agriculture, water
resources, military, and disaster relief. Pixel-level interpretation is a critical aspect of remote …

Enregistrer Citer Cité 4 fois Autres articles Les 3 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer

Y Zhang, Y Liu, Z Guo, Y Zhang, X Yang… - arxiv preprint arxiv …, 2024 - arxiv.org

In multimodal large language models (MLLMs), vision transformers (ViTs) are widely
employed for visual encoding. However, their performance in solving universal MLLM tasks …

Enregistrer Citer Cité 1 fois Autres articles Les 2 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation

H Jeong, CHP Huang, JC Ye, N Mitra… - arxiv preprint arxiv …, 2024 - arxiv.org

While recent foundational video generators produce visually rich output, they still struggle
with appearance drift, where objects gradually degrade or change inconsistently across …

Enregistrer Citer Cité 1 fois Autres articles Version HTML

[Free GPT-4]

[PDF] arxiv.org

Keypoint Abstraction using Large Models for Object-Relative Imitation Learning

X Fang, BR Huang, J Mao, J Shone… - arxiv preprint arxiv …, 2024 - arxiv.org

Generalization to novel object configurations and instances across diverse tasks and
environments is a critical challenge in robotics. Keypoint-based representations have been …

Enregistrer Citer Cité 1 fois Autres articles Les 2 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Sampart3d: Segment any part in 3d objects

Y Yang, Y Huang, YC Guo, L Lu, X Wu, EY Lam… - arxiv preprint arxiv …, 2024 - arxiv.org

3D part segmentation is a crucial and challenging task in 3D perception, playing a vital role
in applications such as robotics, 3D generation, and 3D editing. Recent methods harness …

Enregistrer Citer Cité 1 fois Autres articles Version HTML

[Free GPT-4]

[PDF] arxiv.org

A refreshed similarity-based upsampler for direct high-ratio feature upsampling

M Zhou, H Wang, Y Zheng, D Meng - arxiv preprint arxiv:2407.02283, 2024 - arxiv.org

Feature upsampling is a fundamental and indispensable ingredient of almost all current
network structures for image segmentation tasks. Recently, a popular similarity-based …

Enregistrer Citer Cité 1 fois Autres articles Les 2 versions Free GPT-4 Version HTML

Créer l'alerte

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

Featup: A model-agnostic framework for features at any resolution

Improving 2d feature representations by 3d-aware fine-tuning

Stablenormal: Reducing diffusion variance for stable and sharp normal

Real-time 4k super-resolution of compressed AVIF images. AIS 2024 challenge survey

Lift: A surprisingly simple lightweight feature transform for dense vit descriptors

Segearth-ov: Towards training-free open-vocabulary segmentation for remote sensing images

LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer

Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation

Keypoint Abstraction using Large Models for Object-Relative Imitation Learning

Sampart3d: Segment any part in 3d objects

A refreshed similarity-based upsampler for direct high-ratio feature upsampling