RE-GZSL: Relation Extrapolation for Generalized Zero-Shot Learning
Unlike Conventional Zero-Shot Learning (CZSL) which only focuses on the recognition of
unseen classes by using a classifier trained on seen classes and semantic embeddings …
unseen classes by using a classifier trained on seen classes and semantic embeddings …
Generalization-Enhanced Few-Shot Object Detection in Remote Sensing
Object detection is a fundamental task in computer vision that involves accurately locating
and classifying objects within images or video frames. In remote sensing, this task is …
and classifying objects within images or video frames. In remote sensing, this task is …
MoBox: Enhancing Video Object Segmentation with Motion-Augmented Box Supervision
We propose MoBox, a low-cost solution for semi-supervised video object segmentation that
requires only bounding boxes as manual annotations for training. Built upon a mature semi …
requires only bounding boxes as manual annotations for training. Built upon a mature semi …
CILP-FGDI: Exploiting Vision-Language Model for Generalizable Person Re-Identification
H Zhao, L Qi, X Geng - IEEE Transactions on Information …, 2025 - ieeexplore.ieee.org
The Visual Language Model, known for its robust cross-modal capabilities, has been
extensively applied in various computer vision tasks. In this paper, we explore the use of …
extensively applied in various computer vision tasks. In this paper, we explore the use of …
Energy-Efficient Wireless Technology Recognition Method Using Time-Frequency Feature Fusion Spiking Neural Networks
Wireless Technology Recognition (WTR) distinguishes different wireless technologies by
analyzing characteristic features extracted from radio signals. While deep learning (DL) …
analyzing characteristic features extracted from radio signals. While deep learning (DL) …
Unified Feature Consistency of Under-Performing Pixels and Valid Regions for Semi-Supervised Medical Image Segmentation
Existing semi-supervised medical image segmentation methods based on the teacher-
student model often employ unweighted pixel-level consistency loss, neglecting the varying …
student model often employ unweighted pixel-level consistency loss, neglecting the varying …
Fast Sampling of Diffusion Models for Accelerated MRI using Dual Manifold Constraints
Diffusion models show great potential in solving inverse problems, including MRI
reconstruction. With its unique characteristics, medical imaging demands both efficiency and …
reconstruction. With its unique characteristics, medical imaging demands both efficiency and …
Semantic-Aware Late-Stage Supervised Contrastive Learning for Fine-Grained Action Recognition
Y Pan, Q Zhao, Y Zhang, Z Wang… - IEEE Transactions on …, 2025 - ieeexplore.ieee.org
Fine-grained action recognition typically faces challenges with lower inter-class variances
and higher intra-class variances. Supervised contrastive learning is inherently suitable for …
and higher intra-class variances. Supervised contrastive learning is inherently suitable for …
TM2SP: A Transformer-based Multi-Level Spatiotemporal Feature Pyramid Network for Video Saliency Prediction
C Li, S Liu - IEEE Transactions on Circuits and Systems for …, 2025 - ieeexplore.ieee.org
This paper proposes an end-to-end video saliency prediction network model, termed TM2SP-
Net (Transformer-based Multi-level Spatiotemporal Feature Pyramid Network). Leveraging …
Net (Transformer-based Multi-level Spatiotemporal Feature Pyramid Network). Leveraging …
High-level Feature Guided Decoding for Semantic Segmentation
Existing pyramid-based upsamplers (eg. SemanticFPN), although efficient, usually produce
less accurate results compared to dilation-based models when using the same backbone …
less accurate results compared to dilation-based models when using the same backbone …