Foundation models in robotics: Applications, challenges, and the future

R Firoozi, J Tucker, S Tian… - … Journal of Robotics …, 2023 - journals.sagepub.com
We survey applications of pretrained foundation models in robotics. Traditional deep
learning models in robotics are trained on small datasets tailored for specific tasks, which …

Woodpecker: Hallucination correction for multimodal large language models

S Yin, C Fu, S Zhao, T Xu, H Wang, D Sui… - Science China …, 2024 - Springer
Hallucinations is a big shadow hanging over the rapidly evolving multimodal large language
models (MLLMs), referring to that the generated text is inconsistent with the image content …

Pointmamba: A simple state space model for point cloud analysis

D Liang, X Zhou, W Xu, X Zhu, Z Zou, X Ye… - arxiv preprint arxiv …, 2024 - arxiv.org
Transformers have become one of the foundational architectures in point cloud analysis
tasks due to their excellent global modeling ability. However, the attention mechanism has …

A simple vision transformer for weakly semi-supervised 3d object detection

D Zhang, D Liang, Z Zou, J Li, X Ye… - Proceedings of the …, 2023 - openaccess.thecvf.com
Advanced 3D object detection methods usually rely on large-scale, elaborately labeled
datasets to achieve good performance. However, labeling the bounding boxes for the 3D …

Sam-6d: Segment anything model meets zero-shot 6d object pose estimation

J Lin, L Liu, D Lu, K Jia - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Zero-shot 6D object pose estimation involves the detection of novel objects with their 6D
poses in cluttered scenes presenting significant challenges for model generalizability …

A survey on segment anything model (sam): Vision foundation model meets prompt engineering

C Zhang, FD Puspitasari, S Zheng, C Li, Y Qiao… - arxiv preprint arxiv …, 2023 - arxiv.org
Segment anything model (SAM) developed by Meta AI Research has recently attracted
significant attention. Trained on a large segmentation dataset of over 1 billion masks, SAM is …

Chat-scene: Bridging 3d scene and large language models with object identifiers

H Huang, Y Chen, Z Wang, R Huang, R Xu… - The Thirty-eighth …, 2024 - openreview.net
Recent advancements in 3D Large Language Models (LLMs) have demonstrated promising
capabilities for 3D scene understanding. However, previous methods exhibit deficiencies in …

Chat-3d v2: Bridging 3d scene and large language models with object identifiers

H Huang, Z Wang, R Huang, L Liu, X Cheng… - arxiv preprint arxiv …, 2023 - arxiv.org
Recent research has evidenced the significant potentials of Large Language Models (LLMs)
in handling challenging tasks within 3D scenes. However, current models are constrained to …

Black-box targeted adversarial attack on segment anything (sam)

S Zheng, C Zhang, X Hao - IEEE Transactions on Multimedia, 2024 - ieeexplore.ieee.org
Deep recognition models are widely vulnerable to adversarial examples, which change the
model output by adding quasi-imperceptible perturbation to the image input. Recently …

Forging vision foundation models for autonomous driving: Challenges, methodologies, and opportunities

X Yan, H Zhang, Y Cai, J Guo, W Qiu, B Gao… - arxiv preprint arxiv …, 2024 - arxiv.org
The rise of large foundation models, trained on extensive datasets, is revolutionizing the
field of AI. Models such as SAM, DALL-E2, and GPT-4 showcase their adaptability by …