From google gemini to openai q*(q-star): A survey of resha** the generative artificial intelligence (ai) research landscape

TR McIntosh, T Susnjak, T Liu, P Watters… - arxiv preprint arxiv …, 2023 - arxiv.org
This comprehensive survey explored the evolving landscape of generative Artificial
Intelligence (AI), with a specific focus on the transformative impacts of Mixture of Experts …

Human motion generation: A survey

W Zhu, X Ma, D Ro, H Ci, J Zhang, J Shi… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Human motion generation aims to generate natural human pose sequences and shows
immense potential for real-world applications. Substantial progress has been made recently …

Objaverse-xl: A universe of 10m+ 3d objects

M Deitke, R Liu, M Wallingford, H Ngo… - Advances in …, 2024 - proceedings.neurips.cc
Natural language processing and 2D vision models have attained remarkable proficiency on
many tasks primarily by escalating the scale of training data. However, 3D vision tasks have …

Foundation models in robotics: Applications, challenges, and the future

R Firoozi, J Tucker, S Tian… - … Journal of Robotics …, 2023 - journals.sagepub.com
We survey applications of pretrained foundation models in robotics. Traditional deep
learning models in robotics are trained on small datasets tailored for specific tasks, which …

Pointllm: Empowering large language models to understand point clouds

R Xu, X Wang, T Wang, Y Chen, J Pang… - European Conference on …, 2024 - Springer
The unprecedented advancements in Large Language Models (LLMs) have shown a
profound impact on natural language processing but are yet to fully embrace the realm of 3D …

CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets

L Zhang, Z Wang, Q Zhang, Q Qiu, A Pang… - ACM Transactions on …, 2024 - dl.acm.org
In the realm of digital creativity, our potential to craft intricate 3D worlds from imagination is
often hampered by the limitations of existing digital tools, which demand extensive expertise …

Shapellm: Universal 3d object understanding for embodied interaction

Z Qi, R Dong, S Zhang, H Geng, C Han, Z Ge… - … on Computer Vision, 2024 - Springer
This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM)
designed for embodied interaction, exploring a universal 3D object understanding with 3D …

Vlmevalkit: An open-source toolkit for evaluating large multi-modality models

H Duan, J Yang, Y Qiao, X Fang, L Chen, Y Liu… - Proceedings of the …, 2024 - dl.acm.org
We present VLMEvalKit: an open-source toolkit for evaluating large multi-modality models
based on PyTorch. The toolkit aims to provide a user-friendly and comprehensive framework …

Point-bind & point-llm: Aligning point cloud with multi-modality for 3d understanding, generation, and instruction following

Z Guo, R Zhang, X Zhu, Y Tang, X Ma, J Han… - arxiv preprint arxiv …, 2023 - arxiv.org
We introduce Point-Bind, a 3D multi-modality model aligning point clouds with 2D image,
language, audio, and video. Guided by ImageBind, we construct a joint embedding space …

[HTML][HTML] DILF: Differentiable rendering-based multi-view Image–Language Fusion for zero-shot 3D shape understanding

X Ning, Z Yu, L Li, W Li, P Tiwari - Information Fusion, 2024 - Elsevier
Zero-shot 3D shape understanding aims to recognize “unseen” 3D categories that are not
present in training data. Recently, Contrastive Language–Image Pre-training (CLIP) has …