Object detection using deep learning, CNNs and vision transformers: A review

AB Amjoud, M Amrouch - IEEE Access, 2023 - ieeexplore.ieee.org
Detecting objects remains one of computer vision and image understanding applications'
most fundamental and challenging aspects. Significant advances in object detection have …

Automatic analysis of facial actions: A survey

B Martinez, MF Valstar, B Jiang… - IEEE transactions on …, 2017 - ieeexplore.ieee.org
As one of the most comprehensive and objective ways to describe facial expressions, the
Facial Action Coding System (FACS) has recently received significant attention. Over the …

Context encoding for semantic segmentation

H Zhang, K Dana, J Shi, Z Zhang… - Proceedings of the …, 2018 - openaccess.thecvf.com
Recent work has made significant progress in improving spatial resolution for pixelwise
labeling with Fully Convolutional Network (FCN) framework by employing Dilated/Atrous …

Unsupervised visual representation learning by context prediction

C Doersch, A Gupta, AA Efros - Proceedings of the IEEE …, 2015 - cv-foundation.org
This work explores the use of spatial context as a source of free and plentiful supervisory
signal for training a rich visual representation. Given only a large, unlabeled image …

Vision transformers for remote sensing image classification

Y Bazi, L Bashmal, MMA Rahhal, RA Dayil, NA Ajlan - Remote Sensing, 2021 - mdpi.com
In this paper, we propose a remote-sensing scene-classification method based on vision
transformers. These types of networks, which are now recognized as state-of-the-art models …

Visual relationship detection with language priors

C Lu, R Krishna, M Bernstein, L Fei-Fei - … 11–14, 2016, Proceedings, Part I …, 2016 - Springer
Visual relationships capture a wide variety of interactions between pairs of objects in images
(eg “man riding bicycle” and “man pushing bicycle”). Consequently, the set of possible …

[BOOK][B] Computer vision: algorithms and applications

R Szeliski - 2022 - books.google.com
Humans perceive the three-dimensional structure of the world with apparent ease. However,
despite all of the recent advances in computer vision research, the dream of having a …

Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories

S Lazebnik, C Schmid, J Ponce - 2006 IEEE computer society …, 2006 - ieeexplore.ieee.org
This paper presents a method for recognizing scene categories based on approximate
global geometric correspondence. This technique works by partitioning the image into …

Unsupervised learning of visual representations using videos

X Wang, A Gupta - … of the IEEE international conference on …, 2015 - openaccess.thecvf.com
Is strong supervision necessary for learning a good visual representation? Do we really
need millions of semantically-labeled images to train a Convolutional Neural Network …

LabelMe: a database and web-based tool for image annotation

BC Russell, A Torralba, KP Murphy… - International journal of …, 2008 - Springer
We seek to build a large collection of images with ground truth labels to be used for object
detection and recognition research. Such data is useful for supervised learning and …