Object-centric slot diffusion

J Jiang, F Deng, G Singh, S Ahn - arxiv preprint arxiv:2303.10834, 2023 - arxiv.org
The recent success of transformer-based image generative models in object-centric learning
highlights the importance of powerful image generators for handling complex scenes …

Object-centric learning for real-world videos by predicting temporal feature similarities

A Zadaianchuk, M Seitzer… - Advances in Neural …, 2024 - proceedings.neurips.cc
Unsupervised video-based object-centric learning is a promising avenue to learn structured
representations from large, unlabeled video collections, but previous approaches have only …

Zero-shot object-centric representation learning

A Didolkar, A Zadaianchuk, A Goyal, M Mozer… - arxiv preprint arxiv …, 2024 - arxiv.org
The goal of object-centric representation learning is to decompose visual scenes into a
structured representation that isolates the entities. Recent successes have shown that object …

Exploring the effectiveness of object-centric representations in visual question answering: Comparative insights with foundation models

AMK Mamaghan, S Papa, KH Johansson… - arxiv preprint arxiv …, 2024 - arxiv.org
Object-centric (OC) representations, which represent the state of a visual scene by modeling
it as a composition of objects, have the potential to be used in various downstream tasks to …

SPOT: Self-Training with Patch-Order Permutation for Object-Centric Learning with Autoregressive Transformers

I Kakogeorgiou, S Gidaris… - Proceedings of the …, 2024 - openaccess.thecvf.com
Unsupervised object-centric learning aims to decompose scenes into interpretable object
entities termed slots. Slot-based auto-encoders stand out as a prominent method for this …

Unsupervised object localization in the era of self-supervised vits: A survey

O Siméoni, É Zablocki, S Gidaris, G Puy… - International Journal of …, 2024 - Springer
The recent enthusiasm for open-world vision systems show the high interest of the
community to perform perception tasks outside of the closed-vocabulary benchmark setups …

Guided diffusion from self-supervised diffusion features

VT Hu, Y Chen, M Caron, YM Asano… - arxiv preprint arxiv …, 2023 - arxiv.org
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by
the need for extra data annotation or classifier pretraining. That is why guidance was …

Layout-agnostic scene text image synthesis with diffusion models

Q Zhangli, J Jiang, D Liu, L Yu, X Dai… - 2024 IEEE/CVF …, 2024 - computer.org
While diffusion models have significantly advanced the quality of image generation, their
capability to accurately and coherently render text within these images remains a substantial …

Object-centric temporal consistency via conditional autoregressive inductive biases

C Meo, A Nakano, M Lică, A Didolkar, M Suzuki… - arxiv preprint arxiv …, 2024 - arxiv.org
Unsupervised object-centric learning from videos is a promising approach towards learning
compositional representations that can be applied to various downstream tasks, such as …

View-centric multi-object tracking with homographic matching in moving uav

D Ji, S Gao, L Zhu, Q Zhu, Y Zhao, P Xu, H Lu… - arxiv preprint arxiv …, 2024 - arxiv.org
In this paper, we address the challenge of multi-object tracking (MOT) in moving Unmanned
Aerial Vehicle (UAV) scenarios, where irregular flight trajectories, such as hovering, turning …