Learning multi-object dynamics with compositional neural radiance fields

D Driess, Z Huang, Y Li, R Tedrake… - Conference on robot …, 2023 - proceedings.mlr.press
We present a method to learn compositional multi-object dynamics models from image
observations based on implicit object encoders, Neural Radiance Fields (NeRFs), and …

Joint hand motion and interaction hotspots prediction from egocentric videos

S Liu, S Tripathi, S Majumdar… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
We propose to forecast future hand-object interactions given an egocentric video. Instead of
predicting action labels or pixels, we directly predict the hand motion trajectory and the …

Slotformer: Unsupervised visual dynamics simulation with object-centric models

Z Wu, N Dvornik, K Greff, T Kipf, A Garg - arxiv preprint arxiv:2210.05861, 2022 - arxiv.org
Understanding dynamics from visual observations is a challenging problem that requires
disentangling individual objects from the scene and learning their interactions. While recent …

Dynamic visual reasoning by learning differentiable physics models from video and language

M Ding, Z Chen, T Du, P Luo… - Advances In Neural …, 2021 - proceedings.neurips.cc
In this work, we propose a unified framework, called Visual Reasoning with Differ-entiable
Physics (VRDP), that can jointly learn visual concepts and infer physics models of objects …

Graph inverse reinforcement learning from diverse videos

S Kumar, J Zamora, N Hansen… - … on Robot Learning, 2023 - proceedings.mlr.press
Abstract Research on Inverse Reinforcement Learning (IRL) from third-person videos has
shown encouraging results on removing the need for manual reward design for robotic …

Neural production systems

AG ALIAS PARTH GOYAL, A Didolkar… - Advances in …, 2021 - proceedings.neurips.cc
Visual environments are structured, consisting of distinct objects or entities. These entities
have properties---visible or latent---that determine the manner in which they interact with one …

Visual reinforcement learning with self-supervised 3d representations

Y Ze, N Hansen, Y Chen, M Jain… - IEEE Robotics and …, 2023 - ieeexplore.ieee.org
A prominent approach to visual Reinforcement Learning (RL) is to learn an internal state
representation using self-supervised methods, which has the potential benefit of improved …

Physion: Evaluating physical prediction from vision in humans and machines

DM Bear, E Wang, D Mrowca, FJ Binder… - arxiv preprint arxiv …, 2021 - arxiv.org
While current vision algorithms excel at many challenging tasks, it is unclear how well they
understand the physical dynamics of real-world environments. Here we introduce Physion, a …

Progressive instance-aware feature learning for compositional action recognition

R Yan, L **e, X Shu, L Zhang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
In order to enable the model to generalize to unseen “action-objects”(compositional action),
previous methods encode multiple pieces of information (ie, the appearance, position, and …

Vdt: General-purpose video diffusion transformers via mask modeling

H Lu, G Yang, N Fei, Y Huo, Z Lu, P Luo… - arxiv preprint arxiv …, 2023 - arxiv.org
This work introduces Video Diffusion Transformer (VDT), which pioneers the use of
transformers in diffusion-based video generation. It features transformer blocks with …