iNeMo: Incremental Neural Mesh Models for Robust Class-Incremental Learning

T Fischer, Y Liu, A Jesslen, N Ahmed, P Kaushik… - … on Computer Vision, 2024‏ - Springer
Different from human nature, it is still common practice today for vision tasks to train deep
learning models only initially and on fixed datasets. A variety of approaches have recently …

Imagenet3d: Towards general-purpose object-level 3d understanding

W Ma, G Zeng, G Zhang, Q Liu, L Zhang… - arxiv preprint arxiv …, 2024‏ - arxiv.org
A vision model with general-purpose object-level 3D understanding should be capable of
inferring both 2D (eg, class name and bounding box) and 3D information (eg, 3D location …

Neural textured deformable meshes for robust analysis-by-synthesis

A Wang, W Ma, A Yuille… - Proceedings of the IEEE …, 2024‏ - openaccess.thecvf.com
Human vision demonstrates higher robustness than current AI algorithms under out-of-
distribution scenarios. It has been conjectured such robustness benefits from performing …

Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering

X Wang, W Ma, A Wang, S Chen, A Kortylewski… - arxiv preprint arxiv …, 2024‏ - arxiv.org
For vision-language models (VLMs), understanding the dynamic properties of objects and
their interactions within 3D scenes from video is crucial for effective reasoning. In this work …

Learning a Category-level Object Pose Estimator without Pose Annotations

F Tian, Y Liu, A Kortylewski, Y Duan, S Du… - arxiv preprint arxiv …, 2024‏ - arxiv.org
3D object pose estimation is a challenging task. Previous works always require thousands of
object images with annotated poses for learning the 3D pose correspondence, which is …

Latent Enhancing Autoencoder for Occluded Image Classification

K Kotwal, T Deshmukh, P Gopal - 2024 IEEE International …, 2024‏ - ieeexplore.ieee.org
Large occlusions result in a significant decline in image classification accuracy. During
inference, diverse types of unseen occlusions introduce out-of-distribution data to the …

Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering

X Wang, W Ma, A Wang, S Chen, A Kortylewski… - … Conference on Learning …‏ - openreview.net
For vision-language models (VLMs), understanding the dynamic properties of objects and
their interactions in 3D scenes from videos is crucial for effective reasoning about high-level …