Sparkle: Mastering basic spatial capabilities in vision language models elicits generalization to composite spatial reasoning
Vision language models (VLMs) have demonstrated impressive performance across a wide
range of downstream tasks. However, their proficiency in spatial reasoning remains limited …
range of downstream tasks. However, their proficiency in spatial reasoning remains limited …
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models
This work explores expanding the capabilities of large language models (LLMs) pretrained
on text to generate 3D meshes within a unified model. This offers key advantages of (1) …
on text to generate 3D meshes within a unified model. This offers key advantages of (1) …