Region-based convolutional networks for accurate object detection and segmentation

R Girshick, J Donahue, T Darrell… - IEEE transactions on …, 2015 - ieeexplore.ieee.org
Object detection performance, as measured on the canonical PASCAL VOC Challenge
datasets, plateaued in the final years of the competition. The best-performing methods were …

Unsupervised visual representation learning by context prediction

C Doersch, A Gupta, AA Efros - Proceedings of the IEEE …, 2015 - cv-foundation.org
This work explores the use of spatial context as a source of free and plentiful supervisory
signal for training a rich visual representation. Given only a large, unlabeled image …

Edge boxes: Locating object proposals from edges

CL Zitnick, P Dollár - Computer Vision–ECCV 2014: 13th European …, 2014 - Springer
The use of object proposals is an effective recent approach for increasing the computational
efficiency of object detection. We propose a novel method for generating object bounding …

Selective search for object recognition

JRR Uijlings, KEA Van De Sande, T Gevers… - International journal of …, 2013 - Springer
This paper addresses the problem of generating possible object locations for use in object
recognition. We introduce selective search which combines the strength of both an …

Visual relationship detection with language priors

C Lu, R Krishna, M Bernstein, L Fei-Fei - … 11–14, 2016, Proceedings, Part I …, 2016 - Springer
Visual relationships capture a wide variety of interactions between pairs of objects in images
(eg “man riding bicycle” and “man pushing bicycle”). Consequently, the set of possible …

Videos as space-time region graphs

X Wang, A Gupta - Proceedings of the European …, 2018 - openaccess.thecvf.com
How do humans recognize the action" opening a book"? We argue that there are two
important cues: modeling temporal shape dynamics and modeling functional relationships …

[BOOK][B] Computer vision: algorithms and applications

R Szeliski - 2022 - books.google.com
Humans perceive the three-dimensional structure of the world with apparent ease. However,
despite all of the recent advances in computer vision research, the dream of having a …

Unsupervised learning of visual representations using videos

X Wang, A Gupta - … of the IEEE international conference on …, 2015 - openaccess.thecvf.com
Is strong supervision necessary for learning a good visual representation? Do we really
need millions of semantically-labeled images to train a Convolutional Neural Network …

LabelMe: a database and web-based tool for image annotation

BC Russell, A Torralba, KP Murphy… - International journal of …, 2008 - Springer
We seek to build a large collection of images with ground truth labels to be used for object
detection and recognition research. Such data is useful for supervised learning and …

Shuffle and learn: unsupervised learning using temporal order verification

I Misra, CL Zitnick, M Hebert - … , The Netherlands, October 11–14, 2016 …, 2016 - Springer
In this paper, we present an approach for learning a visual representation from the raw
spatiotemporal signals in videos. Our representation is learned without supervision from …