Google Académico

R Firoozi, J Tucker, S Tian… - … Journal of Robotics …, 2023 - journals.sagepub.com

We survey applications of pretrained foundation models in robotics. Traditional deep
learning models in robotics are trained on small datasets tailored for specific tasks, which …

Guardar Citar Citado por 129 Artículos relacionados Las 2 versiones

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Woodpecker: Hallucination correction for multimodal large language models

S Yin, C Fu, S Zhao, T Xu, H Wang, D Sui… - Science China …, 2024 - Springer

Hallucinations is a big shadow hanging over the rapidly evolving multimodal large language
models (MLLMs), referring to that the generated text is inconsistent with the image content …

Guardar Citar Citado por 162 Artículos relacionados Las 2 versiones

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Pointmamba: A simple state space model for point cloud analysis

D Liang, X Zhou, W Xu, X Zhu, Z Zou, X Ye… - arxiv preprint arxiv …, 2024 - arxiv.org

Transformers have become one of the foundational architectures in point cloud analysis
tasks due to their excellent global modeling ability. However, the attention mechanism has …

Guardar Citar Citado por 106 Artículos relacionados Las 2 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

A simple vision transformer for weakly semi-supervised 3d object detection

D Zhang, D Liang, Z Zou, J Li, X Ye… - Proceedings of the …, 2023 - openaccess.thecvf.com

Advanced 3D object detection methods usually rely on large-scale, elaborately labeled
datasets to achieve good performance. However, labeling the bounding boxes for the 3D …

Guardar Citar Citado por 27 Artículos relacionados Las 4 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Sam-6d: Segment anything model meets zero-shot 6d object pose estimation

J Lin, L Liu, D Lu, K Jia - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

Zero-shot 6D object pose estimation involves the detection of novel objects with their 6D
poses in cluttered scenes presenting significant challenges for model generalizability …

Guardar Citar Citado por 39 Artículos relacionados Las 3 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey on segment anything model (sam): Vision foundation model meets prompt engineering

C Zhang, FD Puspitasari, S Zheng, C Li, Y Qiao… - arxiv preprint arxiv …, 2023 - arxiv.org

Segment anything model (SAM) developed by Meta AI Research has recently attracted
significant attention. Trained on a large segmentation dataset of over 1 billion masks, SAM is …

Guardar Citar Citado por 70 Artículos relacionados Las 4 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Chat-scene: Bridging 3d scene and large language models with object identifiers

H Huang, Y Chen, Z Wang, R Huang, R Xu… - The Thirty-eighth …, 2024 - openreview.net

Recent advancements in 3D Large Language Models (LLMs) have demonstrated promising
capabilities for 3D scene understanding. However, previous methods exhibit deficiencies in …

Guardar Citar Citado por 12 Artículos relacionados Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Chat-3d v2: Bridging 3d scene and large language models with object identifiers

H Huang, Z Wang, R Huang, L Liu, X Cheng… - arxiv preprint arxiv …, 2023 - arxiv.org

Recent research has evidenced the significant potentials of Large Language Models (LLMs)
in handling challenging tasks within 3D scenes. However, current models are constrained to …

Guardar Citar Citado por 25 Artículos relacionados Las 2 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Black-box targeted adversarial attack on segment anything (sam)

S Zheng, C Zhang, X Hao - IEEE Transactions on Multimedia, 2024 - ieeexplore.ieee.org

Deep recognition models are widely vulnerable to adversarial examples, which change the
model output by adding quasi-imperceptible perturbation to the image input. Recently …

Guardar Citar Citado por 10 Artículos relacionados Las 3 versiones

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Forging vision foundation models for autonomous driving: Challenges, methodologies, and opportunities

X Yan, H Zhang, Y Cai, J Guo, W Qiu, B Gao… - arxiv preprint arxiv …, 2024 - arxiv.org

The rise of large foundation models, trained on extensive datasets, is revolutionizing the
field of AI. Models such as SAM, DALL-E2, and GPT-4 showcase their adaptability by …

Guardar Citar Citado por 17 Artículos relacionados Las 2 versiones Versión en HTML

Crear alerta

Citar

Búsqueda avanzada

Guardado en Mi biblioteca

Sam3d: Zero-shot 3d object detection via segment anything model

Foundation models in robotics: Applications, challenges, and the future

Woodpecker: Hallucination correction for multimodal large language models

Pointmamba: A simple state space model for point cloud analysis

A simple vision transformer for weakly semi-supervised 3d object detection

Sam-6d: Segment anything model meets zero-shot 6d object pose estimation

A survey on segment anything model (sam): Vision foundation model meets prompt engineering

Chat-scene: Bridging 3d scene and large language models with object identifiers

Chat-3d v2: Bridging 3d scene and large language models with object identifiers

Black-box targeted adversarial attack on segment anything (sam)

Forging vision foundation models for autonomous driving: Challenges, methodologies, and opportunities