Remote sensing image scene classification meets deep learning: Challenges, methods, benchmarks, and opportunities
Remote sensing image scene classification, which aims at labeling remote sensing images
with a set of semantic categories based on their contents, has broad applications in a range …
with a set of semantic categories based on their contents, has broad applications in a range …
A review of object detection based on deep learning
With the rapid development of deep learning techniques, deep convolutional neural
networks (DCNNs) have become more important for object detection. Compared with …
networks (DCNNs) have become more important for object detection. Compared with …
Multiview transformers for video recognition
Video understanding requires reasoning at multiple spatiotemporal resolutions--from short
fine-grained motions to events taking place over longer durations. Although transformer …
fine-grained motions to events taking place over longer durations. Although transformer …
Scale-mae: A scale-aware masked autoencoder for multiscale geospatial representation learning
Large, pretrained models are commonly finetuned with imagery that is heavily augmented to
mimic different conditions and scales, with the resulting models used for various tasks with …
mimic different conditions and scales, with the resulting models used for various tasks with …
Fine-grained image analysis with deep learning: A survey
Fine-grained image analysis (FGIA) is a longstanding and fundamental problem in computer
vision and pattern recognition, and underpins a diverse set of real-world applications. The …
vision and pattern recognition, and underpins a diverse set of real-world applications. The …
Seeing beyond the brain: Conditional diffusion model with sparse masked modeling for vision decoding
Decoding visual stimuli from brain recordings aims to deepen our understanding of the
human visual system and build a solid foundation for bridging human and computer vision …
human visual system and build a solid foundation for bridging human and computer vision …
Yolov4: Optimal speed and accuracy of object detection
There are a huge number of features which are said to improve Convolutional Neural
Network (CNN) accuracy. Practical testing of combinations of such features on large …
Network (CNN) accuracy. Practical testing of combinations of such features on large …
Strip pooling: Rethinking spatial pooling for scene parsing
Spatial pooling has been proven highly effective to capture long-range contextual
information for pixel-wise prediction tasks, such as scene parsing. In this paper, beyond …
information for pixel-wise prediction tasks, such as scene parsing. In this paper, beyond …
P2T: Pyramid pooling transformer for scene understanding
Recently, the vision transformer has achieved great success by pushing the state-of-the-art
of various vision tasks. One of the most challenging problems in the vision transformer is that …
of various vision tasks. One of the most challenging problems in the vision transformer is that …
Attention-based VGG-16 model for COVID-19 chest X-ray image classification
Computer-aided diagnosis (CAD) methods such as Chest X-rays (CXR)-based method is
one of the cheapest alternative options to diagnose the early stage of COVID-19 disease …
one of the cheapest alternative options to diagnose the early stage of COVID-19 disease …