Repvit: Revisiting mobile cnn from vit perspective

A Wang, H Chen, Z Lin, J Han… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Abstract Recently lightweight Vision Transformers (ViTs) demonstrate superior performance
and lower latency compared with lightweight Convolutional Neural Networks (CNNs) on …

Pyra: Parallel yielding re-activation for training-inference efficient task adaptation

Y **ong, H Chen, T Hao, Z Lin, J Han, Y Zhang… - … on Computer Vision, 2024 - Springer
Recently, the scale of transformers has grown rapidly, which introduces considerable
challenges in terms of training overhead and inference efficiency in the scope of task …

Multi-Label Learning with Block Diagonal Labels

L Shen, S Zhao, Y Zhang, H Chen, J Zhou… - Proceedings of the …, 2024 - dl.acm.org
Collecting large-scale multi-label data with full labels is difficult for real-world scenarios.
Many existing studies have tried to address the issue of missing labels caused by annotation …

Prefixkv: Adaptive prefix kv cache is what vision instruction-following models need for efficient generation

A Wang, H Chen, J Tan, K Zhang, X Cai, Z Lin… - arxiv preprint arxiv …, 2024 - arxiv.org
Recently, large vision-language models (LVLMs) have rapidly gained popularity for their
strong generation and reasoning capabilities given diverse multimodal inputs. However …

Text-region matching for multi-label image recognition with missing labels

L Ma, H **e, L Wang, Y Fu, D Sun, H Zhao - Proceedings of the 32nd …, 2024 - dl.acm.org
Recently, large-scale visual language pre-trained (VLP) models have demonstrated
impressive performance across various downstream tasks. Motivated by these …

A Survey on Incomplete Multi-label Learning: Recent Advances and Future Trends

X Li, J Liu, X Wang, S Chen - arxiv preprint arxiv:2406.06119, 2024 - arxiv.org
In reality, data often exhibit associations with multiple labels, making multi-label learning
(MLL) become a prominent research topic. The last two decades have witnessed the …

Rethinking the Effect of Uninformative Class Name in Prompt Learning

F Lv, C Nie, J Zhang, G Yang, G Lin, X Wu… - Proceedings of the 32nd …, 2024 - dl.acm.org
Large pre-trained vision-language models like CLIP have shown amazing zero-shot
recognition performance. To adapt pre-trained vision-language models to downstream …

[CLS] Token Tells Everything Needed for Training-free Efficient MLLMs

A Wang, F Sun, H Chen, Z Lin, J Han… - arxiv preprint arxiv …, 2024 - arxiv.org
Multimodal Large Language Models (MLLMs) have recently demonstrated strong
performance across a wide range of vision-language tasks, garnering significant attention in …

SGTC: Semantic-Guided Triplet Co-training for Sparsely Annotated Semi-Supervised Medical Image Segmentation

K Yan, Q Cai, F Zhang, Z Cao, Z Liu - arxiv preprint arxiv:2412.15526, 2024 - arxiv.org
Although semi-supervised learning has made significant advances in the field of medical
image segmentation, fully annotating a volumetric sample slice by slice remains a costly and …

Boosting Single Positive Multi-label Classification with Generalized Robust Loss

Y Chen, C Li, X Dai, J Li, W Sun, Y Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Multi-label learning (MLL) requires comprehensive multi-semantic annotations that is hard to
fully obtain, thus often resulting in missing labels scenarios. In this paper, we investigate …