- Academic Search

M Andreoni, WT Lunardi, G Lawton, S Thakkar - IEEE Access, 2024 - ieeexplore.ieee.org

This survey explores the transformative role of Generative Artificial Intelligence (GenAI) in
enhancing the trustworthiness, reliability, and security of autonomous systems such as …

Opslaan Citeren Geciteerd door 9 Verwante artikelen Alle 3 versies

[Free GPT-4]
[DeepSeek]

[PDF] sagepub.com

Fmb: a functional manipulation benchmark for generalizable robotic learning

J Luo, C Xu, F Liu, L Tan, Z Lin, J Wu… - … Journal of Robotics …, 2023 - journals.sagepub.com

In this paper, we propose a real-world benchmark for studying robotic learning in the context
of functional manipulation: a robot needs to accomplish complex long-horizon behaviors by …

Opslaan Citeren Geciteerd door 25 Verwante artikelen Alle 4 versies

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Benchmark evaluations, applications, and challenges of large vision language models: A survey

Z Li, X Wu, H Du, H Nghiem, G Shi - arxiv preprint arxiv:2501.02189, 2025 - arxiv.org

Multimodal Vision Language Models (VLMs) have emerged as a transformative technology
at the intersection of computer vision and natural language processing, enabling machines …

Opslaan Citeren Geciteerd door 3 Verwante artikelen Alle 3 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Manipulate-anything: Automating real-world robots using vision-language models

J Duan, W Yuan, W Pumacay, YR Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

Large-scale endeavors like and widespread community efforts such as Open-X-Embodiment
have contributed to growing the scale of robot demonstration data. However, there is still an …

Opslaan Citeren Geciteerd door 21 Verwante artikelen Alle 5 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Towards efficient llm grounding for embodied multi-agent collaboration

Y Zhang, S Yang, C Bai, F Wu, X Li, Z Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

Grounding the reasoning ability of large language models (LLMs) for embodied tasks is
challenging due to the complexity of the physical world. Especially, LLM planning for multi …

Opslaan Citeren Geciteerd door 17 Verwante artikelen Alle 2 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Latent action pretraining from videos

S Ye, J Jang, B Jeon, S Joo, J Yang, B Peng… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce Latent Action Pretraining for general Action models (LAPA), an unsupervised
method for pretraining Vision-Language-Action (VLA) models without ground-truth robot …

Opslaan Citeren Geciteerd door 13 Verwante artikelen Alle 4 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Policy adaptation via language optimization: Decomposing tasks for few-shot imitation

V Myers, BC Zheng, O Mees, S Levine… - arxiv preprint arxiv …, 2024 - arxiv.org

Learned language-conditioned robot policies often struggle to effectively adapt to new real-
world tasks even when pre-trained across a diverse set of instructions. We propose a novel …

Opslaan Citeren Geciteerd door 10 Verwante artikelen Alle 4 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

In-context imitation learning via next-token prediction

L Fu, H Huang, G Datta, LY Chen, WCH Panitch… - arxiv preprint arxiv …, 2024 - arxiv.org

We explore how to enhance next-token prediction models to perform in-context imitation
learning on a real robot, where the robot executes new tasks by interpreting contextual …

Opslaan Citeren Geciteerd door 8 Verwante artikelen Alle 4 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Autonomous improvement of instruction following skills via foundation models

Z Zhou, P Atreya, A Lee, H Walke, O Mees… - arxiv preprint arxiv …, 2024 - arxiv.org

Intelligent instruction-following robots capable of improving from autonomously collected
experience have the potential to transform robot learning: instead of collecting costly …

Opslaan Citeren Geciteerd door 7 Verwante artikelen Alle 4 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Thinking in space: How multimodal large language models see, remember, and recall spaces

J Yang, S Yang, AW Gupta, R Han, L Fei-Fei… - arxiv preprint arxiv …, 2024 - arxiv.org

Humans possess the visual-spatial intelligence to remember spaces from sequential visual
observations. However, can Multimodal Large Language Models (MLLMs) trained on million …

Opslaan Citeren Geciteerd door 4 Verwante artikelen Alle 2 versies HTML-versie

Melding maken

Citeren

Geavanceerd zoeken

Opgeslagen in Mijn bibliotheek

Open x-embodiment: Robotic learning datasets and rt-x models: Open x-embodiment collaboration 0

Enhancing autonomous system security and resilience with generative AI: A comprehensive survey

Fmb: a functional manipulation benchmark for generalizable robotic learning

Benchmark evaluations, applications, and challenges of large vision language models: A survey

Manipulate-anything: Automating real-world robots using vision-language models

Towards efficient llm grounding for embodied multi-agent collaboration

Latent action pretraining from videos

Policy adaptation via language optimization: Decomposing tasks for few-shot imitation

In-context imitation learning via next-token prediction

Autonomous improvement of instruction following skills via foundation models

Thinking in space: How multimodal large language models see, remember, and recall spaces