Enhancing autonomous system security and resilience with generative AI: A comprehensive survey

M Andreoni, WT Lunardi, G Lawton, S Thakkar - IEEE Access, 2024 - ieeexplore.ieee.org
This survey explores the transformative role of Generative Artificial Intelligence (GenAI) in
enhancing the trustworthiness, reliability, and security of autonomous systems such as …

Fmb: a functional manipulation benchmark for generalizable robotic learning

J Luo, C Xu, F Liu, L Tan, Z Lin, J Wu… - … Journal of Robotics …, 2023 - journals.sagepub.com
In this paper, we propose a real-world benchmark for studying robotic learning in the context
of functional manipulation: a robot needs to accomplish complex long-horizon behaviors by …

Benchmark evaluations, applications, and challenges of large vision language models: A survey

Z Li, X Wu, H Du, H Nghiem, G Shi - arxiv preprint arxiv:2501.02189, 2025 - arxiv.org
Multimodal Vision Language Models (VLMs) have emerged as a transformative technology
at the intersection of computer vision and natural language processing, enabling machines …

Manipulate-anything: Automating real-world robots using vision-language models

J Duan, W Yuan, W Pumacay, YR Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Large-scale endeavors like and widespread community efforts such as Open-X-Embodiment
have contributed to growing the scale of robot demonstration data. However, there is still an …

Towards efficient llm grounding for embodied multi-agent collaboration

Y Zhang, S Yang, C Bai, F Wu, X Li, Z Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Grounding the reasoning ability of large language models (LLMs) for embodied tasks is
challenging due to the complexity of the physical world. Especially, LLM planning for multi …

Latent action pretraining from videos

S Ye, J Jang, B Jeon, S Joo, J Yang, B Peng… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce Latent Action Pretraining for general Action models (LAPA), an unsupervised
method for pretraining Vision-Language-Action (VLA) models without ground-truth robot …

Policy adaptation via language optimization: Decomposing tasks for few-shot imitation

V Myers, BC Zheng, O Mees, S Levine… - arxiv preprint arxiv …, 2024 - arxiv.org
Learned language-conditioned robot policies often struggle to effectively adapt to new real-
world tasks even when pre-trained across a diverse set of instructions. We propose a novel …

In-context imitation learning via next-token prediction

L Fu, H Huang, G Datta, LY Chen, WCH Panitch… - arxiv preprint arxiv …, 2024 - arxiv.org
We explore how to enhance next-token prediction models to perform in-context imitation
learning on a real robot, where the robot executes new tasks by interpreting contextual …

Autonomous improvement of instruction following skills via foundation models

Z Zhou, P Atreya, A Lee, H Walke, O Mees… - arxiv preprint arxiv …, 2024 - arxiv.org
Intelligent instruction-following robots capable of improving from autonomously collected
experience have the potential to transform robot learning: instead of collecting costly …

Thinking in space: How multimodal large language models see, remember, and recall spaces

J Yang, S Yang, AW Gupta, R Han, L Fei-Fei… - arxiv preprint arxiv …, 2024 - arxiv.org
Humans possess the visual-spatial intelligence to remember spaces from sequential visual
observations. However, can Multimodal Large Language Models (MLLMs) trained on million …