Študovňa Google

D Liu, Y Liu, W Huang, W Hu - ar** Manga

Y Wu, X Hu, Y Sun, Y Zhou, W Zhu, F Rao… - arxiv preprint arxiv …, 2024 - arxiv.org

Video Large Language Models (Vid-LLMs) have made remarkable advancements in
comprehending video content for QA dialogue. However, they struggle to extend this visual …

Uložiť Citovať Citované 2-krát Súvisiace články Všetky verzie 3 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models

Z Qi, Z Zhang, Y Fang, J Wang, H Zhao - arxiv preprint arxiv:2501.01428, 2025 - arxiv.org

In recent years, 2D Vision-Language Models (VLMs) have made significant strides in image-
text understanding tasks. However, their performance in 3D spatial comprehension, which is …

Uložiť Citovať Súvisiace články Všetky verzie 2 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Multi-Object 3D Grounding with Dynamic Modules and Language-Informed Spatial Attention

H Zhang, CA Yang, RA Yeh - arxiv preprint arxiv:2410.22306, 2024 - arxiv.org

Multi-object 3D Grounding involves locating 3D boxes based on a given query phrase from
a point cloud. It is a challenging and significant task with numerous applications in visual …

Uložiť Citovať Súvisiace články Všetky verzie 3 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

DenseGrounding: Improving Dense Language-Vision Semantics for Ego-centric 3D Visual Grounding

H Zheng, H Shi, Q Peng, YX Chng, R Huang… - … Conference on Learning … - openreview.net

Enabling intelligent agents to comprehend and interact with 3D environments through
natural language is crucial for advancing robotics and human-computer interaction. A …

Uložiť Citovať Súvisiace články HTML verzia

Vytvoriť upozornenie

Citovať

Rozšírené vyhľadávanie

Uložené do mojej knižnice

Dora: 3d visual grounding with order-aware referring

A survey on text-guided 3D visual grounding: elements, recent advances, and future directions

GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models

Multi-Object 3D Grounding with Dynamic Modules and Language-Informed Spatial Attention

DenseGrounding: Improving Dense Language-Vision Semantics for Ego-centric 3D Visual Grounding