Which is the best way to organize/classify images by content?

A Bosch, X Munoz, R Marti - Image and vision computing, 2007 - Elsevier
Thousands of images are generated every day, which implies the necessity to classify,
organise and access them using an easy, faster and efficient way. Scene classification, the …

Localizing objects with self-supervised transformers and no labels

O Siméoni, G Puy, HV Vo, S Roburin, S Gidaris… - arxiv preprint arxiv …, 2021 - arxiv.org
Localizing objects in image collections without supervision can help to avoid expensive
annotation campaigns. We propose a simple approach to this problem, that leverages the …

Freesolo: Learning to segment objects without annotations

X Wang, Z Yu, S De Mello, J Kautz… - Proceedings of the …, 2022 - openaccess.thecvf.com
Instance segmentation is a fundamental vision task that aims to recognize and segment
each object in an image. However, it requires costly annotations such as bounding boxes …

Videos as space-time region graphs

X Wang, A Gupta - Proceedings of the European …, 2018 - openaccess.thecvf.com
How do humans recognize the action" opening a book"? We argue that there are two
important cues: modeling temporal shape dynamics and modeling functional relationships …

Scaling and benchmarking self-supervised visual representation learning

P Goyal, D Mahajan, A Gupta… - Proceedings of the ieee …, 2019 - openaccess.thecvf.com
Self-supervised learning aims to learn representations from the data itself without explicit
manual supervision. Existing efforts ignore a crucial aspect of self-supervised learning-the …

Visual relationship detection with language priors

C Lu, R Krishna, M Bernstein, L Fei-Fei - … 11–14, 2016, Proceedings, Part I …, 2016 - Springer
Visual relationships capture a wide variety of interactions between pairs of objects in images
(eg “man riding bicycle” and “man pushing bicycle”). Consequently, the set of possible …

Unsupervised representation learning by sorting sequences

HY Lee, JB Huang, M Singh… - Proceedings of the IEEE …, 2017 - openaccess.thecvf.com
We present an unsupervised representation learning approach using videos without
semantic labels. We leverage the temporal coherence as a supervisory signal by formulating …

Scene graph generation from objects, phrases and region captions

Y Li, W Ouyang, B Zhou, K Wang… - Proceedings of the …, 2017 - openaccess.thecvf.com
Object detection, scene graph generation and region captioning, which are three scene
understanding tasks at different semantic levels, are tied together: scene graphs are …

Probabilistic topic modeling in multilingual settings: An overview of its methodology and applications

I Vulić, W De Smet, J Tang, MF Moens - Information Processing & …, 2015 - Elsevier
Probabilistic topic models are unsupervised generative models which model document
content as a two-step generation process, that is, documents are observed as mixtures of …

Shuffle and learn: unsupervised learning using temporal order verification

I Misra, CL Zitnick, M Hebert - … , The Netherlands, October 11–14, 2016 …, 2016 - Springer
In this paper, we present an approach for learning a visual representation from the raw
spatiotemporal signals in videos. Our representation is learned without supervision from …