Bridging the gap: A unified video comprehension framework for moment retrieval and highlight detection

Y **ao, Z Luo, Y Liu, Y Ma, H Bian… - Proceedings of the …, 2024‏ - openaccess.thecvf.com
Abstract Video Moment Retrieval (MR) and Highlight Detection (HD) have attracted
significant attention due to the growing demand for video analysis. Recent approaches treat …

Mind the interference: Retaining pre-trained knowledge in parameter efficient continual learning of vision-language models

L Tang, Z Tian, K Li, C He, H Zhou, H Zhao, X Li… - … on Computer Vision, 2024‏ - Springer
This study addresses the Domain-Class Incremental Learning problem, a realistic but
challenging continual learning scenario where both the domain distribution and target …

Intelligent electronic components waste detection in complex occlusion environments based on the focusing dynamic channel-you only look once model

H Liu, Y Jiang, W Zhang, Y Li, W Ma - Journal of Cleaner Production, 2025‏ - Elsevier
The exponential increase in electronic waste has become a major worldwide issue, driven
by the rapid technological advances and the proliferation of the consumer electronics …

UniQA: Unified Vision-Language Pre-training for Image Quality and Aesthetic Assessment

H Zhou, L Tang, R Yang, G Qin, Y Zhang, R Hu… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Image Quality Assessment (IQA) and Image Aesthetic Assessment (IAA) aim to simulate
human subjective perception of image visual quality and aesthetic appeal. Existing methods …

GrootVL: Tree Topology is All You Need in State Space Model

Y **ao, L Song, S Huang, J Wang, S Song, Y Ge… - arxiv preprint arxiv …, 2024‏ - arxiv.org
The state space models, employing recursively propagated features, demonstrate strong
representation capabilities comparable to Transformer models and superior efficiency …

RPEE-HEADS: A Novel Benchmark for Pedestrian Head Detection in Crowd Videos

M Abubaker, Z Alsadder, H Abdelhaq, M Boltes… - arxiv preprint arxiv …, 2024‏ - arxiv.org
The automatic detection of pedestrian heads in crowded environments is essential for crowd
analysis and management tasks, particularly in high-risk settings such as railway platforms …

Video Object Segmentation with Dynamic Query Modulation

H Zhou, R Hu, X Li - arxiv preprint arxiv:2403.11529, 2024‏ - arxiv.org
Storing intermediate frame segmentations as memory for long-range context modeling,
spatial-temporal memory-based methods have recently showcased impressive results in …

UniTracker: transformer-based CrossUnihead for multi-object tracking

F Wu, Y Zhang - Journal of Real-Time Image Processing, 2024‏ - Springer
In recent years, tracking-by-detection (TBD) has emerged as the predominant approach for
Multi-object Tracking (MOT). Most TBD algorithms typically employ separate branch heads …

BMDCNet: A Satellite Imagery Road Extraction Algorithm based on Multi-level Road Feature

C Wang, J Lu, Z Chen - IEEE Geoscience and Remote Sensing …, 2024‏ - ieeexplore.ieee.org
Multilevel road feature extraction from remote sensing image plays an important role in
numerous applications such as autonomous driving and urban planning. However …

IAFI-FCOS: Intra-and across-layer feature interaction FCOS model for lesion detection of CT images

Q Guan, M Pan, F Chen, Z Yang, Z Yu… - … Joint Conference on …, 2024‏ - ieeexplore.ieee.org
Effective lesion detection in medical image is not only rely on the features of lesion region,
but also deeply relative to the surrounding information. However, most current methods have …