- Academic Search

From google gemini to openai q*(q-star): A survey of resha** the generative artificial intelligence (ai) research landscape

TR McIntosh, T Susnjak, T Liu, P Watters… - arxiv preprint arxiv …, 2023 - arxiv.org

This comprehensive survey explored the evolving landscape of generative Artificial
Intelligence (AI), with a specific focus on the transformative impacts of Mixture of Experts …

保存引用被引用次数：128 相关文章所有 3 个版本 HTML 版

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Human motion generation: A survey

W Zhu, X Ma, D Ro, H Ci, J Zhang, J Shi… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org

Human motion generation aims to generate natural human pose sequences and shows
immense potential for real-world applications. Substantial progress has been made recently …

保存引用被引用次数：65 相关文章所有 8 个版本

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Objaverse-xl: A universe of 10m+ 3d objects

M Deitke, R Liu, M Wallingford, H Ngo… - Advances in …, 2024 - proceedings.neurips.cc

Natural language processing and 2D vision models have attained remarkable proficiency on
many tasks primarily by escalating the scale of training data. However, 3D vision tasks have …

保存引用被引用次数：296 相关文章所有 6 个版本 HTML 版

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Foundation models in robotics: Applications, challenges, and the future

R Firoozi, J Tucker, S Tian… - … Journal of Robotics …, 2023 - journals.sagepub.com

We survey applications of pretrained foundation models in robotics. Traditional deep
learning models in robotics are trained on small datasets tailored for specific tasks, which …

保存引用被引用次数：131 相关文章所有 2 个版本

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Pointllm: Empowering large language models to understand point clouds

R Xu, X Wang, T Wang, Y Chen, J Pang… - European Conference on …, 2024 - Springer

The unprecedented advancements in Large Language Models (LLMs) have shown a
profound impact on natural language processing but are yet to fully embrace the realm of 3D …

保存引用被引用次数：125 相关文章所有 3 个版本

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets

L Zhang, Z Wang, Q Zhang, Q Qiu, A Pang… - ACM Transactions on …, 2024 - dl.acm.org

In the realm of digital creativity, our potential to craft intricate 3D worlds from imagination is
often hampered by the limitations of existing digital tools, which demand extensive expertise …

保存引用被引用次数：53 相关文章

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Shapellm: Universal 3d object understanding for embodied interaction

Z Qi, R Dong, S Zhang, H Geng, C Han, Z Ge… - … on Computer Vision, 2024 - Springer

This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM)
designed for embodied interaction, exploring a universal 3D object understanding with 3D …

保存引用被引用次数：43 相关文章所有 2 个版本

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Vlmevalkit: An open-source toolkit for evaluating large multi-modality models

H Duan, J Yang, Y Qiao, X Fang, L Chen, Y Liu… - Proceedings of the …, 2024 - dl.acm.org

We present VLMEvalKit: an open-source toolkit for evaluating large multi-modality models
based on PyTorch. The toolkit aims to provide a user-friendly and comprehensive framework …

保存引用被引用次数：39 相关文章所有 5 个版本

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Point-bind & point-llm: Aligning point cloud with multi-modality for 3d understanding, generation, and instruction following

Z Guo, R Zhang, X Zhu, Y Tang, X Ma, J Han… - arxiv preprint arxiv …, 2023 - arxiv.org

We introduce Point-Bind, a 3D multi-modality model aligning point clouds with 2D image,
language, audio, and video. Guided by ImageBind, we construct a joint embedding space …

保存引用被引用次数：107 相关文章所有 3 个版本 HTML 版

[Free GPT-4]
[DeepSeek]

[HTML] sciencedirect.com

[HTML][HTML] DILF: Differentiable rendering-based multi-view Image–Language Fusion for zero-shot 3D shape understanding

X Ning, Z Yu, L Li, W Li, P Tiwari - Information Fusion, 2024 - Elsevier

Zero-shot 3D shape understanding aims to recognize “unseen” 3D categories that are not
present in training data. Recently, Contrastive Language–Image Pre-training (CLIP) has …

保存引用被引用次数：54 相关文章所有 5 个版本

创建快讯

引用

高级搜索

已保存到“我的图书馆”

Ulip-2: Towards scalable multimodal pre-training for 3d understanding

From google gemini to openai q*(q-star): A survey of resha** the generative artificial intelligence (ai) research landscape

Human motion generation: A survey

Objaverse-xl: A universe of 10m+ 3d objects

Foundation models in robotics: Applications, challenges, and the future

Pointllm: Empowering large language models to understand point clouds

CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets

Shapellm: Universal 3d object understanding for embodied interaction

Vlmevalkit: An open-source toolkit for evaluating large multi-modality models

Point-bind & point-llm: Aligning point cloud with multi-modality for 3d understanding, generation, and instruction following

[HTML][HTML] DILF: Differentiable rendering-based multi-view Image–Language Fusion for zero-shot 3D shape understanding