A survey of embodied ai: From simulators to research tasks

J Duan, S Yu, HL Tan, H Zhu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
There has been an emerging paradigm shift from the era of “internet AI” to “embodied AI,”
where AI algorithms and agents no longer learn from datasets of images, videos or text …

Foundation models in robotics: Applications, challenges, and the future

R Firoozi, J Tucker, S Tian… - … Journal of Robotics …, 2023 - journals.sagepub.com
We survey applications of pretrained foundation models in robotics. Traditional deep
learning models in robotics are trained on small datasets tailored for specific tasks, which …

Inner monologue: Embodied reasoning through planning with language models

W Huang, F **a, T **ao, H Chan, J Liang… - arxiv preprint arxiv …, 2022 - arxiv.org
Recent works have shown how the reasoning capabilities of Large Language Models
(LLMs) can be applied to domains beyond natural language processing, such as planning …

Foundation models for decision making: Problems, methods, and opportunities

S Yang, O Nachum, Y Du, J Wei, P Abbeel… - arxiv preprint arxiv …, 2023 - arxiv.org
Foundation models pretrained on diverse data at scale have demonstrated extraordinary
capabilities in a wide range of vision and language tasks. When such models are deployed …

On Transforming Reinforcement Learning With Transformers: The Development Trajectory

S Hu, L Shen, Y Zhang, Y Chen… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Transformers, originally devised for natural language processing (NLP), have also produced
significant successes in computer vision (CV). Due to their strong expression power …

Conformal prediction for uncertainty-aware planning with diffusion dynamics model

J Sun, Y Jiang, J Qiu, P Nobel… - Advances in …, 2023 - proceedings.neurips.cc
Robotic applications often involve working in environments that are uncertain, dynamic, and
partially observable. Recently, diffusion models have been proposed for learning trajectory …

Slotformer: Unsupervised visual dynamics simulation with object-centric models

Z Wu, N Dvornik, K Greff, T Kipf, A Garg - arxiv preprint arxiv:2210.05861, 2022 - arxiv.org
Understanding dynamics from visual observations is a challenging problem that requires
disentangling individual objects from the scene and learning their interactions. While recent …

Procedure-aware pretraining for instructional video understanding

H Zhou, R Martín-Martín, M Kapadia… - Proceedings of the …, 2023 - openaccess.thecvf.com
Our goal is to learn a video representation that is useful for downstream procedure
understanding tasks in instructional videos. Due to the small amount of available …

Antgpt: Can large language models help long-term action anticipation from videos?

Q Zhao, S Wang, C Zhang, C Fu, MQ Do… - arxiv preprint arxiv …, 2023 - arxiv.org
Can we better anticipate an actor's future actions (eg mix eggs) by knowing what commonly
happens after his/her current action (eg crack eggs)? What if we also know the longer-term …

A survey on transformers in reinforcement learning

W Li, H Luo, Z Lin, C Zhang, Z Lu, D Ye - arxiv preprint arxiv:2301.03044, 2023 - arxiv.org
Transformer has been considered the dominating neural architecture in NLP and CV, mostly
under supervised settings. Recently, a similar surge of using Transformers has appeared in …