Foundation models in robotics: Applications, challenges, and the future
We survey applications of pretrained foundation models in robotics. Traditional deep
learning models in robotics are trained on small datasets tailored for specific tasks, which …
learning models in robotics are trained on small datasets tailored for specific tasks, which …
Woodpecker: Hallucination correction for multimodal large language models
Hallucinations is a big shadow hanging over the rapidly evolving multimodal large language
models (MLLMs), referring to that the generated text is inconsistent with the image content …
models (MLLMs), referring to that the generated text is inconsistent with the image content …
Pointmamba: A simple state space model for point cloud analysis
Transformers have become one of the foundational architectures in point cloud analysis
tasks due to their excellent global modeling ability. However, the attention mechanism has …
tasks due to their excellent global modeling ability. However, the attention mechanism has …
A simple vision transformer for weakly semi-supervised 3d object detection
Advanced 3D object detection methods usually rely on large-scale, elaborately labeled
datasets to achieve good performance. However, labeling the bounding boxes for the 3D …
datasets to achieve good performance. However, labeling the bounding boxes for the 3D …
Sam-6d: Segment anything model meets zero-shot 6d object pose estimation
Zero-shot 6D object pose estimation involves the detection of novel objects with their 6D
poses in cluttered scenes presenting significant challenges for model generalizability …
poses in cluttered scenes presenting significant challenges for model generalizability …
A survey on segment anything model (sam): Vision foundation model meets prompt engineering
Segment anything model (SAM) developed by Meta AI Research has recently attracted
significant attention. Trained on a large segmentation dataset of over 1 billion masks, SAM is …
significant attention. Trained on a large segmentation dataset of over 1 billion masks, SAM is …
Chat-scene: Bridging 3d scene and large language models with object identifiers
Recent advancements in 3D Large Language Models (LLMs) have demonstrated promising
capabilities for 3D scene understanding. However, previous methods exhibit deficiencies in …
capabilities for 3D scene understanding. However, previous methods exhibit deficiencies in …
Chat-3d v2: Bridging 3d scene and large language models with object identifiers
Recent research has evidenced the significant potentials of Large Language Models (LLMs)
in handling challenging tasks within 3D scenes. However, current models are constrained to …
in handling challenging tasks within 3D scenes. However, current models are constrained to …
Black-box targeted adversarial attack on segment anything (sam)
Deep recognition models are widely vulnerable to adversarial examples, which change the
model output by adding quasi-imperceptible perturbation to the image input. Recently …
model output by adding quasi-imperceptible perturbation to the image input. Recently …
Forging vision foundation models for autonomous driving: Challenges, methodologies, and opportunities
The rise of large foundation models, trained on extensive datasets, is revolutionizing the
field of AI. Models such as SAM, DALL-E2, and GPT-4 showcase their adaptability by …
field of AI. Models such as SAM, DALL-E2, and GPT-4 showcase their adaptability by …