[HTML][HTML] Review of image classification algorithms based on convolutional neural networks

L Chen, S Li, Q Bai, J Yang, S Jiang, Y Miao - Remote Sensing, 2021 - mdpi.com
Image classification has always been a hot research direction in the world, and the
emergence of deep learning has promoted the development of this field. Convolutional …

A survey of modern deep learning based object detection models

SSA Zaidi, MS Ansari, A Aslam, N Kanwal… - Digital Signal …, 2022 - Elsevier
Object Detection is the task of classification and localization of objects in an image or video.
It has gained prominence in recent years due to its widespread applications. This article …

Cross-city matters: A multimodal remote sensing benchmark dataset for cross-city semantic segmentation using high-resolution domain adaptation networks

D Hong, B Zhang, H Li, Y Li, J Yao, C Li… - Remote Sensing of …, 2023 - Elsevier
Artificial intelligence (AI) approaches nowadays have gained remarkable success in single-
modality-dominated remote sensing (RS) applications, especially with an emphasis on …

Alphapose: Whole-body regional multi-person pose estimation and tracking in real-time

HS Fang, J Li, H Tang, C Xu, H Zhu… - … on Pattern Analysis …, 2022 - ieeexplore.ieee.org
Accurate whole-body multi-person pose estimation and tracking is an important yet
challenging topic in computer vision. To capture the subtle actions of humans for complex …

Sequential modeling enables scalable learning for large vision models

Y Bai, X Geng, K Mangalam, A Bar… - Proceedings of the …, 2024 - openaccess.thecvf.com
We introduce a novel sequential modeling approach which enables learning a Large Vision
Model (LVM) without making use of any linguistic data. To do this we define a common …

Glipv2: Unifying localization and vision-language understanding

H Zhang, P Zhang, X Hu, YC Chen… - Advances in …, 2022 - proceedings.neurips.cc
We present GLIPv2, a grounded VL understanding model, that serves both localization tasks
(eg, object detection, instance segmentation) and Vision-Language (VL) understanding …

Poseformerv2: Exploring frequency domain for efficient and robust 3d human pose estimation

Q Zhao, C Zheng, M Liu, P Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recently, transformer-based methods have gained significant success in sequential 2D-to-
3D lifting human pose estimation. As a pioneering work, PoseFormer captures spatial …

Learning enriched features for fast image restoration and enhancement

SW Zamir, A Arora, S Khan, M Hayat… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Given a degraded input image, image restoration aims to recover the missing high-quality
image content. Numerous applications demand effective image restoration, eg …

Mpvit: Multi-path vision transformer for dense prediction

Y Lee, J Kim, J Willette… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Dense computer vision tasks such as object detection and segmentation require effective
multi-scale feature representation for detecting or classifying objects or regions with varying …

Mhformer: Multi-hypothesis transformer for 3d human pose estimation

W Li, H Liu, H Tang, P Wang… - Proceedings of the …, 2022 - openaccess.thecvf.com
Estimating 3D human poses from monocular videos is a challenging task due to depth
ambiguity and self-occlusion. Most existing works attempt to solve both issues by exploiting …