Can Large Language Models Understand Symbolic Graphics Programs?

Z Qiu, W Liu, H Feng, Z Liu, TZ **ao, KM Collins… - arxiv preprint arxiv …, 2024 - arxiv.org
Against the backdrop of enthusiasm for large language models (LLMs), there is an urgent
need to scientifically assess their capabilities and shortcomings. This is nontrivial in part …

ChatGarment: Garment Estimation, Generation and Editing via Large Language Models

S Bian, C Xu, Y **u, A Grigorev, Z Liu, C Lu… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce ChatGarment, a novel approach that leverages large vision-language models
(VLMs) to automate the estimation, generation, and editing of 3D garments from images or …

GRS: Generating Robotic Simulation Tasks from Real-World Images

A Zook, FY Sun, J Spjut, V Blukis, S Birchfield… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce GRS (Generating Robotic Simulation tasks), a novel system to address the
challenge of real-to-sim in robotics, computer vision, and AR/VR. GRS enables the creation …

Reconstructing Animals and the Wild

P Kulits, MJ Black, S Zuffi - arxiv preprint arxiv:2411.18807, 2024 - arxiv.org
The idea of 3D reconstruction as scene understanding is foundational in computer vision.
Reconstructing 3D scenes from 2D visual observations requires strong priors to …

Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models

R Wu, W Su, J Liao - arxiv preprint arxiv:2411.16602, 2024 - arxiv.org
Scalable Vector Graphics (SVG) has become the de facto standard for vector graphics in
digital design, offering resolution independence and precise control over individual …

DI-PCG: Diffusion-based Efficient Inverse Procedural Content Generation for High-quality 3D Asset Creation

W Zhao, YP Cao, J Xu, Y Dong, Y Shan - arxiv preprint arxiv:2412.15200, 2024 - arxiv.org
Procedural Content Generation (PCG) is powerful in creating high-quality 3D contents, yet
controlling it to produce desired shapes is difficult and often requires extensive parameter …

RLS3: RL-Based Synthetic Sample Selection to Enhance Spatial Reasoning in Vision-Language Models for Indoor Autonomous Perception

JR Waite, MZ Hasan, Q Liu, Z Jiang, C Hegde… - arxiv preprint arxiv …, 2025 - arxiv.org
Vision-language model (VLM) fine-tuning for application-specific visual grounding based on
natural language instructions has become one of the most popular approaches for learning …

[PDF][PDF] R2D3: Imparting Spatial Reasoning by Reconstructing 3D Scenes from 2D Images

A Ray, D Bashkirova, R Tan, KH Zeng, BA Plummer… - openreview.net
Cognitive scientists herald 3D spatial reasoning as a fundamental foundation for all
intellectual processes. Multimodal large language models (MLMs), which have been widely …