Remote sensing image scene classification meets deep learning: Challenges, methods, benchmarks, and opportunities

G Cheng, X **e, J Han, L Guo… - IEEE Journal of Selected …, 2020 - ieeexplore.ieee.org
Remote sensing image scene classification, which aims at labeling remote sensing images
with a set of semantic categories based on their contents, has broad applications in a range …

A review of object detection based on deep learning

Y **ao, Z Tian, J Yu, Y Zhang, S Liu, S Du… - Multimedia Tools and …, 2020 - Springer
With the rapid development of deep learning techniques, deep convolutional neural
networks (DCNNs) have become more important for object detection. Compared with …

Multiview transformers for video recognition

S Yan, X **ong, A Arnab, Z Lu… - Proceedings of the …, 2022 - openaccess.thecvf.com
Video understanding requires reasoning at multiple spatiotemporal resolutions--from short
fine-grained motions to events taking place over longer durations. Although transformer …

Scale-mae: A scale-aware masked autoencoder for multiscale geospatial representation learning

CJ Reed, R Gupta, S Li, S Brockman… - Proceedings of the …, 2023 - openaccess.thecvf.com
Large, pretrained models are commonly finetuned with imagery that is heavily augmented to
mimic different conditions and scales, with the resulting models used for various tasks with …

Fine-grained image analysis with deep learning: A survey

XS Wei, YZ Song, O Mac Aodha, J Wu… - IEEE transactions on …, 2021 - ieeexplore.ieee.org
Fine-grained image analysis (FGIA) is a longstanding and fundamental problem in computer
vision and pattern recognition, and underpins a diverse set of real-world applications. The …

Seeing beyond the brain: Conditional diffusion model with sparse masked modeling for vision decoding

Z Chen, J Qing, T **ang, WL Yue… - Proceedings of the …, 2023 - openaccess.thecvf.com
Decoding visual stimuli from brain recordings aims to deepen our understanding of the
human visual system and build a solid foundation for bridging human and computer vision …

Yolov4: Optimal speed and accuracy of object detection

A Bochkovskiy, CY Wang, HYM Liao - arxiv preprint arxiv:2004.10934, 2020 - arxiv.org
There are a huge number of features which are said to improve Convolutional Neural
Network (CNN) accuracy. Practical testing of combinations of such features on large …

Strip pooling: Rethinking spatial pooling for scene parsing

Q Hou, L Zhang, MM Cheng… - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com
Spatial pooling has been proven highly effective to capture long-range contextual
information for pixel-wise prediction tasks, such as scene parsing. In this paper, beyond …

P2T: Pyramid pooling transformer for scene understanding

YH Wu, Y Liu, X Zhan… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Recently, the vision transformer has achieved great success by pushing the state-of-the-art
of various vision tasks. One of the most challenging problems in the vision transformer is that …

Attention-based VGG-16 model for COVID-19 chest X-ray image classification

C Sitaula, MB Hossain - Applied Intelligence, 2021 - Springer
Computer-aided diagnosis (CAD) methods such as Chest X-rays (CXR)-based method is
one of the cheapest alternative options to diagnose the early stage of COVID-19 disease …