- Academic Search

K Kawaharazuka, T Matsushima… - Advanced …, 2024 - Taylor & Francis

Recent developments in foundation models, like Large Language Models (LLMs) and Vision-
Language Models (VLMs), trained on extensive data, facilitate flexible application across …

Uložit Citovat Počet citací tohoto článku: 40 Související články Všechny verze (počet: 2)

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Autoregressive image generation without vector quantization

T Li, Y Tian, H Li, M Deng, K He - Advances in Neural …, 2025 - proceedings.neurips.cc

Conventional wisdom holds that autoregressive models for image generation are typically
accompanied by vector-quantized tokens. We observe that while a discrete-valued space …

Uložit Citovat Počet citací tohoto článku: 81 Související články Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Moka: Open-vocabulary robotic manipulation through mark-based visual prompting

F Liu, K Fang, P Abbeel, S Levine - First Workshop on Vision …, 2024 - openreview.net

Open-vocabulary generalization requires robotic systems to perform tasks involving complex
and diverse environments and task goals. While the recent advances in vision language …

Uložit Citovat Počet citací tohoto článku: 56 Související články Všechny verze (počet: 2) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Mobile aloha: Learning bimanual mobile manipulation with low-cost whole-body teleoperation

Z Fu, TZ Zhao, C Finn - arxiv preprint arxiv:2401.02117, 2024 - arxiv.org

Imitation learning from human demonstrations has shown impressive performance in
robotics. However, most results focus on table-top manipulation, lacking the mobility and …

Uložit Citovat Počet citací tohoto článku: 227 Související články Všechny verze (počet: 3) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Scaling proprioceptive-visual learning with heterogeneous pre-trained transformers

L Wang, X Chen, J Zhao, K He - arxiv preprint arxiv:2409.20537, 2024 - arxiv.org

One of the roadblocks for training generalist robotic models today is heterogeneity. Previous
robot learning methods often collect data to train with one specific embodiment for one task …

Uložit Citovat Počet citací tohoto článku: 16 Související články Všechny verze (počet: 5) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Scaling cross-embodied learning: One policy for manipulation, navigation, locomotion and aviation

R Doshi, H Walke, O Mees, S Dasari… - arxiv preprint arxiv …, 2024 - arxiv.org

Modern machine learning systems rely on large datasets to attain broad generalization, and
this often poses a challenge in robot learning, where each robotic platform and task might …

Uložit Citovat Počet citací tohoto článku: 27 Související články Všechny verze (počet: 3) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Gr-2: A generative video-language-action model with web-scale knowledge for robot manipulation

CL Cheang, G Chen, Y **g, T Kong, H Li, Y Li… - arxiv preprint arxiv …, 2024 - arxiv.org

We present GR-2, a state-of-the-art generalist robot agent for versatile and generalizable
robot manipulation. GR-2 is first pre-trained on a vast number of Internet videos to capture …

Uložit Citovat Počet citací tohoto článku: 18 Související články Všechny verze (počet: 2) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

: A Vision-Language-Action Flow Model for General Robot Control

K Black, N Brown, D Driess, A Esmail, M Equi… - arxiv preprint arxiv …, 2024 - arxiv.org

Robot learning holds tremendous promise to unlock the full potential of flexible, general, and
dexterous robot systems, as well as to address some of the deepest questions in artificial …

Uložit Citovat Počet citací tohoto článku: 18 Související články Všechny verze (počet: 2) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Tinyvla: Towards fast, data-efficient vision-language-action models for robotic manipulation

J Wen, Y Zhu, J Li, M Zhu, K Wu, Z Xu, N Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

Vision-Language-Action (VLA) models have shown remarkable potential in visuomotor
control and instruction comprehension through end-to-end learning processes. However …

Uložit Citovat Počet citací tohoto článku: 16 Související články Všechny verze (počet: 3) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Quest: Self-supervised skill abstractions for learning continuous control

A Mete, H Xue, A Wilcox, Y Chen… - Advances in Neural …, 2025 - proceedings.neurips.cc

Generalization capabilities, or rather a lack thereof, is one of the most important unsolved
problems in the field of robot learning, and while several large scale efforts have set out to …

Uložit Citovat Počet citací tohoto článku: 5 Související články Všechny verze (počet: 6) Zobrazit jako HTML

Vytvořit upozornění

Citovat

Rozšířené vyhledávání

Uloženo do Mojí knihovny

Octo: An open-source generalist robot policy

Real-world robot applications of foundation models: A review

Autoregressive image generation without vector quantization

Moka: Open-vocabulary robotic manipulation through mark-based visual prompting

Mobile aloha: Learning bimanual mobile manipulation with low-cost whole-body teleoperation

Scaling proprioceptive-visual learning with heterogeneous pre-trained transformers

Scaling cross-embodied learning: One policy for manipulation, navigation, locomotion and aviation

Gr-2: A generative video-language-action model with web-scale knowledge for robot manipulation

: A Vision-Language-Action Flow Model for General Robot Control

Tinyvla: Towards fast, data-efficient vision-language-action models for robotic manipulation

Quest: Self-supervised skill abstractions for learning continuous control