Uni3d: Exploring unified 3d representation at scale

J Zhou, J Wang, B Ma, YS Liu, T Huang… - arxiv preprint arxiv …, 2023 - arxiv.org
Scaling up representations for images or text has been extensively investigated in the past
few years and has led to revolutions in learning vision and language. However, scalable …

Omnibind: Teach to build unequal-scale modality interaction for omni-bind of all

Y Lyu, X Zheng, D Kim, L Wang - arxiv preprint arxiv:2405.16108, 2024 - arxiv.org
Research on multi-modal learning dominantly aligns the modalities in a unified space at
training, and only a single one is taken for prediction at inference. However, for a real …

Sculpting holistic 3d representation in contrastive language-image-3d pre-training

Y Gao, Z Wang, WS Zheng, C **e… - Proceedings of the …, 2024 - openaccess.thecvf.com
Contrastive learning has emerged as a promising paradigm for 3D open-world
understanding ie aligning point cloud representation to image and text embedding space …

Artificial human intelligence: The role of humans in the development of next generation AI

SS Arslan - arxiv preprint arxiv:2409.16001, 2024 - arxiv.org
Human intelligence, the most evident and accessible form of source of reasoning, hosted by
biological hardware, has evolved and been refined over thousands of years, positioning …

Duoduo CLIP: Efficient 3D Understanding with Multi-View Images

HH Lee, Y Zhang, AX Chang - arxiv preprint arxiv:2406.11579, 2024 - arxiv.org
We introduce Duoduo CLIP, a model for 3D representation learning that learns shape
encodings from multi-view images instead of point-clouds. The choice of multi-view images …

Signal Processing for Haptic Surface Modeling: a Review

AL Stefani, N Bisagno, A Rosani, N Conci… - arxiv preprint arxiv …, 2024 - arxiv.org
Haptic feedback has been integrated into Virtual and Augmented Reality, complementing
acoustic and visual information and contributing to an all-round immersive experience in …

Crema: Generalizable and efficient video-language reasoning via multimodal modular fusion

S Yu, J Yoon, M Bansal - arxiv preprint arxiv:2402.05889, 2024 - arxiv.org
Despite impressive advancements in recent multimodal reasoning approaches, they are still
limited in flexibility and efficiency, as these models typically process only a few fixed …

Training-free point cloud recognition based on geometric and semantic information fusion

Y Chen, D Huang, Z Liao, X Cheng, X Li… - arxiv preprint arxiv …, 2024 - arxiv.org
The trend of employing training-free methods for point cloud recognition is becoming
increasingly popular due to its significant reduction in computational resources and time …

Visual guided Dual-spatial Interaction Network for Fine-grained Brain Semantic Decoding

J Tang, Y Yang, Q Zhao, Y Ding… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
Brain semantic decoding has received a surge of attention in the computer vision and
neuroscience disciplines. However, existing techniques ignore the sparse and implicit …

AnyTouch: Learning Unified Static-Dynamic Representation across Multiple Visuo-tactile Sensors

R Feng, J Hu, W **a, T Gao, A Shen, Y Sun… - arxiv preprint arxiv …, 2025 - arxiv.org
Visuo-tactile sensors aim to emulate human tactile perception, enabling robots to precisely
understand and manipulate objects. Over time, numerous meticulously designed visuo …