Google Académico

R Po, W Yifan, V Golyanik, K Aberman… - Computer Graphics …, 2024 - Wiley Online Library

The field of visual computing is rapidly advancing due to the emergence of generative
artificial intelligence (AI), which unlocks unprecedented capabilities for the generation …

Guardar Citar Citado por 91 Artículos relacionados Las 12 versiones

[Free GPT-4]

[PDF] tandfonline.com

Real-world robot applications of foundation models: A review

K Kawaharazuka, T Matsushima… - Advanced …, 2024 - Taylor & Francis

Recent developments in foundation models, like Large Language Models (LLMs) and Vision-
Language Models (VLMs), trained on extensive data, facilitate flexible application across …

Guardar Citar Citado por 39 Artículos relacionados Las 2 versiones

[Free GPT-4]

[PDF] arxiv.org

Foundation models in robotics: Applications, challenges, and the future

R Firoozi, J Tucker, S Tian… - … Journal of Robotics …, 2023 - journals.sagepub.com

We survey applications of pretrained foundation models in robotics. Traditional deep
learning models in robotics are trained on small datasets tailored for specific tasks, which …

Guardar Citar Citado por 125 Artículos relacionados Las 2 versiones

[Free GPT-4]

[PDF] arxiv.org

Pointllm: Empowering large language models to understand point clouds

R Xu, X Wang, T Wang, Y Chen, J Pang… - European Conference on …, 2024 - Springer

The unprecedented advancements in Large Language Models (LLMs) have shown a
profound impact on natural language processing but are yet to fully embrace the realm of 3D …

Guardar Citar Citado por 122 Artículos relacionados Las 3 versiones

[Free GPT-4]

[PDF] neurips.cc

Scalable 3d captioning with pretrained models

T Luo, C Rockwell, H Lee… - Advances in Neural …, 2024 - proceedings.neurips.cc

We introduce Cap3D, an automatic approach for generating descriptive text for 3D objects.
This approach utilizes pretrained models from image captioning, image-text alignment, and …

Guardar Citar Citado por 133 Artículos relacionados Las 6 versiones Versión en HTML

[Free GPT-4]

[PDF] thecvf.com

Ulip-2: Towards scalable multimodal pre-training for 3d understanding

L Xue, N Yu, S Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com

Recent advancements in multimodal pre-training have shown promising efficacy in 3D
representation learning by aligning multimodal features across 3D shapes their 2D …

Guardar Citar Citado por 99 Artículos relacionados Las 3 versiones Versión en HTML

[Free GPT-4]

[PDF] neurips.cc

Openshape: Scaling up 3d shape representation towards open-world understanding

M Liu, R Shi, K Kuang, Y Zhu, X Li… - Advances in neural …, 2024 - proceedings.neurips.cc

We introduce OpenShape, a method for learning multi-modal joint representations of text,
image, and point clouds. We adopt the commonly used multi-modal contrastive learning …

Guardar Citar Citado por 96 Artículos relacionados Las 5 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

Shapellm: Universal 3d object understanding for embodied interaction

Z Qi, R Dong, S Zhang, H Geng, C Han, Z Ge… - … on Computer Vision, 2024 - Springer

This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM)
designed for embodied interaction, exploring a universal 3D object understanding with 3D …

Guardar Citar Citado por 40 Artículos relacionados Las 2 versiones

[Free GPT-4]

[PDF] ieee.org

Towards open vocabulary learning: A survey

J Wu, X Li, S Xu, H Yuan, H Ding… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

In the field of visual scene understanding, deep neural networks have made impressive
advancements in various core tasks like segmentation, tracking, and detection. However …

Guardar Citar Citado por 124 Artículos relacionados Las 10 versiones

[Free GPT-4]

[PDF] thecvf.com

Distilling large vision-language model with out-of-distribution generalizability

X Li, Y Fang, M Liu, Z Ling, Z Tu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Large vision-language models have achieved outstanding performance, but their size and
computational requirements make their deployment on resource-constrained devices and …

Guardar Citar Citado por 28 Artículos relacionados Las 5 versiones Versión en HTML

Crear alerta

Citar

Búsqueda avanzada

Guardado en Mi biblioteca

Ulip: Learning a unified representation of language, images, and point clouds for 3d understanding

State of the art on diffusion models for visual computing

Real-world robot applications of foundation models: A review

Foundation models in robotics: Applications, challenges, and the future

Pointllm: Empowering large language models to understand point clouds

Scalable 3d captioning with pretrained models

Ulip-2: Towards scalable multimodal pre-training for 3d understanding

Openshape: Scaling up 3d shape representation towards open-world understanding

Shapellm: Universal 3d object understanding for embodied interaction

Towards open vocabulary learning: A survey

Distilling large vision-language model with out-of-distribution generalizability