Unified-io 2: Scaling autoregressive multimodal models with vision language audio and action

J Lu, C Clark, S Lee, Z Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com
We present Unified-IO 2 a multimodal and multi-skill unified model capable of following
novel instructions. Unified-IO 2 can use text images audio and/or videos as input and can …

A survey of optimization-based task and motion planning: From classical to learning approaches

Z Zhao, S Cheng, Y Ding, Z Zhou… - IEEE/ASME …, 2024 - ieeexplore.ieee.org
Task and motion planning (TAMP) integrates high-level task planning and low-level motion
planning to equip robots with the autonomy to effectively reason over long-horizon, dynamic …

Generative ai for self-adaptive systems: State of the art and research roadmap

J Li, M Zhang, N Li, D Weyns, Z **, K Tei - ACM Transactions on …, 2024 - dl.acm.org
Self-adaptive systems (SASs) are designed to handle changes and uncertainties through a
feedback loop with four core functionalities: monitoring, analyzing, planning, and execution …

Diffusion models for reinforcement learning: A survey

Z Zhu, H Zhao, H He, Y Zhong, S Zhang, H Guo… - arxiv preprint arxiv …, 2023 - arxiv.org
Diffusion models surpass previous generative models in sample quality and training
stability. Recent works have shown the advantages of diffusion models in improving …

Poco: Policy composition from and for heterogeneous robot learning

L Wang, J Zhao, Y Du, EH Adelson… - arxiv preprint arxiv …, 2024 - arxiv.org
Training general robotic policies from heterogeneous data for different tasks is a significant
challenge. Existing robotic datasets vary in different modalities such as color, depth, tactile …

Compositional generative modeling: A single model is not all you need

Y Du, L Kaelbling - arxiv preprint arxiv:2402.01103, 2024 - arxiv.org
Large monolithic generative models trained on massive amounts of data have become an
increasingly dominant approach in AI research. In this paper, we argue that we should …

Language-driven 6-dof grasp detection using negative prompt guidance

T Nguyen, MN Vu, B Huang, A Vuong, Q Vuong… - … on Computer Vision, 2024 - Springer
DoF grasp detection has been a fundamental and challenging problem in robotic vision.
While previous works have focused on ensuring grasp stability, they often do not consider …

Practice makes perfect: Planning to learn skill parameter policies

N Kumar, T Silver, W McClinton, L Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org
One promising approach towards effective robot decision making in complex, long-horizon
tasks is to sequence together parameterized skills. We consider a setting where a robot is …

Deep generative models in robotics: A survey on learning from multimodal demonstrations

J Urain, A Mandlekar, Y Du, M Shafiullah, D Xu… - arxiv preprint arxiv …, 2024 - arxiv.org
Learning from Demonstrations, the field that proposes to learn robot behavior models from
data, is gaining popularity with the emergence of deep generative models. Although the …

Reorientdiff: Diffusion model based reorientation for object manipulation

UA Mishra, Y Chen - 2024 IEEE International Conference on …, 2024 - ieeexplore.ieee.org
The ability to manipulate objects in desired configurations is a fundamental requirement for
robots to complete various practical applications. While certain goals can be achieved by …