Bridging the gap to real-world object-centric learning

M Seitzer, M Horn, A Zadaianchuk, D Zietlow… - arxiv preprint arxiv …, 2022 - arxiv.org
Humans naturally decompose their environment into entities at the appropriate level of
abstraction to act in the world. Allowing machine learning algorithms to derive this …

Simple unsupervised object-centric learning for complex and naturalistic videos

G Singh, YF Wu, S Ahn - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Unsupervised object-centric learning aims to represent the modular, compositional, and
causal structure of a scene as a set of object representations and thereby promises to …

Towards semantic equivalence of tokenization in multimodal llm

S Wu, H Fei, X Li, J Ji, H Zhang, TS Chua… - arxiv preprint arxiv …, 2024 - arxiv.org
Multimodal Large Language Models (MLLMs) have demonstrated exceptional capabilities in
processing vision-language tasks. One of the crux of MLLMs lies in vision tokenization …

Illiterate dall-e learns to compose

G Singh, F Deng, S Ahn - arxiv preprint arxiv:2110.11405, 2021 - arxiv.org
Although DALL-E has shown an impressive ability of composition-based systematic
generalization in image generation, it requires the dataset of text-image pairs and the …

Object-centric learning for real-world videos by predicting temporal feature similarities

A Zadaianchuk, M Seitzer… - Advances in Neural …, 2023 - proceedings.neurips.cc
Unsupervised video-based object-centric learning is a promising avenue to learn structured
representations from large, unlabeled video collections, but previous approaches have only …

Slotdiffusion: Object-centric generative modeling with diffusion models

Z Wu, J Hu, W Lu, I Gilitschenski… - Advances in Neural …, 2023 - proceedings.neurips.cc
Object-centric learning aims to represent visual data with a set of object entities (aka slots),
providing structured representations that enable systematic generalization. Leveraging …

Provably learning object-centric representations

J Brady, RS Zimmermann, Y Sharma… - International …, 2023 - proceedings.mlr.press
Learning structured representations of the visual world in terms of objects promises to
significantly improve the generalization abilities of current machine learning models. While …

Object-centric slot diffusion

J Jiang, F Deng, G Singh, S Ahn - arxiv preprint arxiv:2303.10834, 2023 - arxiv.org
The recent success of transformer-based image generative models in object-centric learning
highlights the importance of powerful image generators for handling complex scenes …

Decomposing 3d scenes into objects via unsupervised volume segmentation

K Stelzner, K Kersting, AR Kosiorek - arxiv preprint arxiv:2104.01148, 2021 - arxiv.org
We present ObSuRF, a method which turns a single image of a scene into a 3D model
represented as a set of Neural Radiance Fields (NeRFs), with each NeRF corresponding to …

Clevrtex: A texture-rich benchmark for unsupervised multi-object segmentation

L Karazija, I Laina, C Rupprecht - arxiv preprint arxiv:2111.10265, 2021 - arxiv.org
There has been a recent surge in methods that aim to decompose and segment scenes into
multiple objects in an unsupervised manner, ie, unsupervised multi-object segmentation …