Google Učenjak

S Nasiriany, F **a, W Yu, T **ao, J Liang… - arxiv preprint arxiv …, 2024 - arxiv.org

Vision language models (VLMs) have shown impressive capabilities across a variety of
tasks, from logical reasoning to visual understanding. This opens the door to richer …

Shrani Navedi Navedeno v 84 virih Sorodni članki Vse različice: 7 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

When llms step into the 3d world: A survey and meta-analysis of 3d tasks via multi-modal large language models

X Ma, Y Bhalgat, B Smart, S Chen, X Li, J Ding… - arxiv preprint arxiv …, 2024 - arxiv.org

As large language models (LLMs) evolve, their integration with 3D spatial data (3D-LLMs)
has seen rapid progress, offering unprecedented capabilities for understanding and …

Shrani Navedi Navedeno v 14 virih Sorodni članki Vse različice: 5 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Coarse correspondence elicit 3d spacetime understanding in multimodal language model

B Liu, Y Dong, Y Wang, Y Rao, Y Tang, WC Ma… - arxiv preprint arxiv …, 2024 - arxiv.org

Multimodal language models (MLLMs) are increasingly being implemented in real-world
environments, necessitating their ability to interpret 3D spaces and comprehend temporal …

Shrani Navedi Navedeno v 12 virih Sorodni članki Vse različice: 3 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Visual prompting in multimodal large language models: A survey

J Wu, Z Zhang, Y **a, X Li, Z **a, A Chang, T Yu… - arxiv preprint arxiv …, 2024 - arxiv.org

Multimodal large language models (MLLMs) equip pre-trained large-language models
(LLMs) with visual capabilities. While textual prompting in LLMs has been widely studied …

Shrani Navedi Navedeno v 8 virih Sorodni članki Vse različice: 3 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Gensim2: Scaling robot data generation with multi-modal and reasoning llms

P Hua, M Liu, A Macaluso, Y Lin, W Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

Robotic simulation today remains challenging to scale up due to the human efforts required
to create diverse simulation tasks and scenes. Simulation-trained policies also face …

Shrani Navedi Navedeno v 4 virih Sorodni članki Vse različice: 3 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Visual Preference Inference: An Image Sequence-Based Preference Reasoning in Tabletop Object Manipulation

J Lee, S Park, Y Kwon, J Lee, M Ahn… - 2024 IEEE/RSJ …, 2024 - ieeexplore.ieee.org

In robotic object manipulation, human preferences can often be influenced by the visual
attributes of objects, such as color and shape. These properties play a crucial role in …

Shrani Navedi Sorodni članki Vse različice: 4

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Coarse Correspondences Boost 3D Spacetime Understanding in Multimodal Language Model

B Liu, Y Dong, Y Wang, Z Ma, Y Tang, L Tang, Y Rao… - openreview.net

Multimodal language models (MLLMs) are increasingly being applied in real-world
environments, necessitating their ability to interpret 3D spaces and compre-hend temporal …

Shrani Navedi Sorodni članki V obliki HTML

Ustvari opozorilo

Navedi

Napredno iskanje

Shranjeno v Mojo knjižnico

3daxiesprompts: Unleashing the 3d spatial task capabilities of gpt-4v

Pivot: Iterative visual prompting elicits actionable knowledge for vlms

When llms step into the 3d world: A survey and meta-analysis of 3d tasks via multi-modal large language models

Coarse correspondence elicit 3d spacetime understanding in multimodal language model

Visual prompting in multimodal large language models: A survey

Gensim2: Scaling robot data generation with multi-modal and reasoning llms

Visual Preference Inference: An Image Sequence-Based Preference Reasoning in Tabletop Object Manipulation

Coarse Correspondences Boost 3D Spacetime Understanding in Multimodal Language Model