- Academic Search

T Fischer, Y Liu, A Jesslen, N Ahmed, P Kaushik… - … on Computer Vision, 2024‏ - Springer‏

Different from human nature, it is still common practice today for vision tasks to train deep
learning models only initially and on fixed datasets. A variety of approaches have recently …‏

שמור צטט צוטט על ידי 2 מאמרים בנושא זה כל 11 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Imagenet3d: Towards general-purpose object-level 3d understanding‏

W Ma, G Zeng, G Zhang, Q Liu, L Zhang… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

A vision model with general-purpose object-level 3D understanding should be capable of
inferring both 2D (eg, class name and bounding box) and 3D information (eg, 3D location …‏

שמור צטט צוטט על ידי 4 מאמרים בנושא זה כל 6 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Neural textured deformable meshes for robust analysis-by-synthesis‏

A Wang, W Ma, A Yuille… - Proceedings of the IEEE …, 2024‏ - openaccess.thecvf.com‏

Human vision demonstrates higher robustness than current AI algorithms under out-of-
distribution scenarios. It has been conjectured such robustness benefits from performing …‏

שמור צטט צוטט על ידי 6 מאמרים בנושא זה כל 8 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering‏

X Wang, W Ma, A Wang, S Chen, A Kortylewski… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

For vision-language models (VLMs), understanding the dynamic properties of objects and
their interactions within 3D scenes from video is crucial for effective reasoning. In this work …‏

שמור צטט צוטט על ידי 1 מאמרים בנושא זה כל 4 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Learning a Category-level Object Pose Estimator without Pose Annotations‏

F Tian, Y Liu, A Kortylewski, Y Duan, S Du… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

3D object pose estimation is a challenging task. Previous works always require thousands of
object images with annotated poses for learning the 3D pose correspondence, which is …‏

שמור צטט מאמרים בנושא זה כל 4 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Latent Enhancing Autoencoder for Occluded Image Classification‏

K Kotwal, T Deshmukh, P Gopal - 2024 IEEE International …, 2024‏ - ieeexplore.ieee.org‏

Large occlusions result in a significant decline in image classification accuracy. During
inference, diverse types of unseen occlusions introduce out-of-distribution data to the …‏

שמור צטט מאמרים בנושא זה כל 4 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering‏

X Wang, W Ma, A Wang, S Chen, A Kortylewski… - … Conference on Learning …‏ - openreview.net‏

For vision-language models (VLMs), understanding the dynamic properties of objects and
their interactions in 3D scenes from videos is crucial for effective reasoning about high-level …‏

שמור צטט מאמרים בנושא זה פתיחה בתור HTML

יצירת התראה

צטט

חיפוש מתקדם

נשמר בספרייה שלי

Robust 3d-aware object classification via discriminative render-and-compare

iNeMo: Incremental Neural Mesh Models for Robust Class-Incremental Learning‏

Imagenet3d: Towards general-purpose object-level 3d understanding‏

Neural textured deformable meshes for robust analysis-by-synthesis‏

Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering‏

Learning a Category-level Object Pose Estimator without Pose Annotations‏

Latent Enhancing Autoencoder for Occluded Image Classification‏

Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering‏