- Academic Search

J Zhou, J Wang, B Ma, YS Liu, T Huang… - arxiv preprint arxiv …, 2023 - arxiv.org

Scaling up representations for images or text has been extensively investigated in the past
few years and has led to revolutions in learning vision and language. However, scalable …

Opslaan Citeren Geciteerd door 76 Verwante artikelen Alle 4 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Omnibind: Teach to build unequal-scale modality interaction for omni-bind of all

Y Lyu, X Zheng, D Kim, L Wang - arxiv preprint arxiv:2405.16108, 2024 - arxiv.org

Research on multi-modal learning dominantly aligns the modalities in a unified space at
training, and only a single one is taken for prediction at inference. However, for a real …

Opslaan Citeren Geciteerd door 7 Verwante artikelen Alle 2 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Sculpting holistic 3d representation in contrastive language-image-3d pre-training

Y Gao, Z Wang, WS Zheng, C **e… - Proceedings of the …, 2024 - openaccess.thecvf.com

Contrastive learning has emerged as a promising paradigm for 3D open-world
understanding ie aligning point cloud representation to image and text embedding space …

Opslaan Citeren Geciteerd door 6 Verwante artikelen Alle 5 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Artificial human intelligence: The role of humans in the development of next generation AI

SS Arslan - arxiv preprint arxiv:2409.16001, 2024 - arxiv.org

Human intelligence, the most evident and accessible form of source of reasoning, hosted by
biological hardware, has evolved and been refined over thousands of years, positioning …

Opslaan Citeren Geciteerd door 2 Verwante artikelen Alle 3 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Duoduo CLIP: Efficient 3D Understanding with Multi-View Images

HH Lee, Y Zhang, AX Chang - arxiv preprint arxiv:2406.11579, 2024 - arxiv.org

We introduce Duoduo CLIP, a model for 3D representation learning that learns shape
encodings from multi-view images instead of point-clouds. The choice of multi-view images …

Opslaan Citeren Geciteerd door 1 Verwante artikelen Alle 3 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Signal Processing for Haptic Surface Modeling: a Review

AL Stefani, N Bisagno, A Rosani, N Conci… - arxiv preprint arxiv …, 2024 - arxiv.org

Haptic feedback has been integrated into Virtual and Augmented Reality, complementing
acoustic and visual information and contributing to an all-round immersive experience in …

Opslaan Citeren Verwante artikelen Alle 4 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Crema: Generalizable and efficient video-language reasoning via multimodal modular fusion

S Yu, J Yoon, M Bansal - arxiv preprint arxiv:2402.05889, 2024 - arxiv.org

Despite impressive advancements in recent multimodal reasoning approaches, they are still
limited in flexibility and efficiency, as these models typically process only a few fixed …

Opslaan Citeren Geciteerd door 2 Verwante artikelen HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Training-free point cloud recognition based on geometric and semantic information fusion

Y Chen, D Huang, Z Liao, X Cheng, X Li… - arxiv preprint arxiv …, 2024 - arxiv.org

The trend of employing training-free methods for point cloud recognition is becoming
increasingly popular due to its significant reduction in computational resources and time …

Opslaan Citeren Geciteerd door 1 Verwante artikelen Alle 2 versies HTML-versie

Visual guided Dual-spatial Interaction Network for Fine-grained Brain Semantic Decoding

J Tang, Y Yang, Q Zhao, Y Ding… - IEEE Transactions …, 2024 - ieeexplore.ieee.org

Brain semantic decoding has received a surge of attention in the computer vision and
neuroscience disciplines. However, existing techniques ignore the sparse and implicit …

Opslaan Citeren Verwante artikelen Alle 2 versies

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

AnyTouch: Learning Unified Static-Dynamic Representation across Multiple Visuo-tactile Sensors

R Feng, J Hu, W **a, T Gao, A Shen, Y Sun… - arxiv preprint arxiv …, 2025 - arxiv.org

Visuo-tactile sensors aim to emulate human tactile perception, enabling robots to precisely
understand and manipulate objects. Over time, numerous meticulously designed visuo …

Opslaan Citeren Verwante artikelen HTML-versie

Melding maken

Citeren

Geavanceerd zoeken

Opgeslagen in Mijn bibliotheek

Vit-lens: Towards omni-modal representations

Uni3d: Exploring unified 3d representation at scale

Omnibind: Teach to build unequal-scale modality interaction for omni-bind of all

Sculpting holistic 3d representation in contrastive language-image-3d pre-training

Artificial human intelligence: The role of humans in the development of next generation AI

Duoduo CLIP: Efficient 3D Understanding with Multi-View Images

Signal Processing for Haptic Surface Modeling: a Review

Crema: Generalizable and efficient video-language reasoning via multimodal modular fusion

Training-free point cloud recognition based on geometric and semantic information fusion

Visual guided Dual-spatial Interaction Network for Fine-grained Brain Semantic Decoding

AnyTouch: Learning Unified Static-Dynamic Representation across Multiple Visuo-tactile Sensors