Repvit: Revisiting mobile cnn from vit perspective
Abstract Recently lightweight Vision Transformers (ViTs) demonstrate superior performance
and lower latency compared with lightweight Convolutional Neural Networks (CNNs) on …
and lower latency compared with lightweight Convolutional Neural Networks (CNNs) on …
Pyra: Parallel yielding re-activation for training-inference efficient task adaptation
Recently, the scale of transformers has grown rapidly, which introduces considerable
challenges in terms of training overhead and inference efficiency in the scope of task …
challenges in terms of training overhead and inference efficiency in the scope of task …
Multi-Label Learning with Block Diagonal Labels
Collecting large-scale multi-label data with full labels is difficult for real-world scenarios.
Many existing studies have tried to address the issue of missing labels caused by annotation …
Many existing studies have tried to address the issue of missing labels caused by annotation …
Prefixkv: Adaptive prefix kv cache is what vision instruction-following models need for efficient generation
Recently, large vision-language models (LVLMs) have rapidly gained popularity for their
strong generation and reasoning capabilities given diverse multimodal inputs. However …
strong generation and reasoning capabilities given diverse multimodal inputs. However …
Text-region matching for multi-label image recognition with missing labels
Recently, large-scale visual language pre-trained (VLP) models have demonstrated
impressive performance across various downstream tasks. Motivated by these …
impressive performance across various downstream tasks. Motivated by these …
A Survey on Incomplete Multi-label Learning: Recent Advances and Future Trends
In reality, data often exhibit associations with multiple labels, making multi-label learning
(MLL) become a prominent research topic. The last two decades have witnessed the …
(MLL) become a prominent research topic. The last two decades have witnessed the …
Rethinking the Effect of Uninformative Class Name in Prompt Learning
Large pre-trained vision-language models like CLIP have shown amazing zero-shot
recognition performance. To adapt pre-trained vision-language models to downstream …
recognition performance. To adapt pre-trained vision-language models to downstream …
[CLS] Token Tells Everything Needed for Training-free Efficient MLLMs
Multimodal Large Language Models (MLLMs) have recently demonstrated strong
performance across a wide range of vision-language tasks, garnering significant attention in …
performance across a wide range of vision-language tasks, garnering significant attention in …
SGTC: Semantic-Guided Triplet Co-training for Sparsely Annotated Semi-Supervised Medical Image Segmentation
Although semi-supervised learning has made significant advances in the field of medical
image segmentation, fully annotating a volumetric sample slice by slice remains a costly and …
image segmentation, fully annotating a volumetric sample slice by slice remains a costly and …
Boosting Single Positive Multi-label Classification with Generalized Robust Loss
Y Chen, C Li, X Dai, J Li, W Sun, Y Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Multi-label learning (MLL) requires comprehensive multi-semantic annotations that is hard to
fully obtain, thus often resulting in missing labels scenarios. In this paper, we investigate …
fully obtain, thus often resulting in missing labels scenarios. In this paper, we investigate …