Unveiling encoder-free vision-language models

H Diao, Y Cui, X Li, Y Wang, H Lu, X Wang - arxiv preprint arxiv …, 2024 - arxiv.org
Existing vision-language models (VLMs) mostly rely on vision encoders to extract visual
features followed by large language models (LLMs) for visual-language tasks. However, the …

Deep metric learning in projected-hypersphere space

Y Xu, Z Chen, J Hu - Pattern Recognition, 2025 - Elsevier
Distance metric learning is a subfield of machine learning that aims to learn a discriminative
space in which samples of the same class are closer and samples of different classes are …

KARST: Multi-Kernel Kronecker Adaptation with Re-Scaling Transmission for Visual Classification

Y Zhu, H Diao, S Gao, L Chen, H Lu - arxiv preprint arxiv:2502.06779, 2025 - arxiv.org
Fine-tuning pre-trained vision models for specific tasks is a common practice in computer
vision. However, this process becomes more expensive as models grow larger. Recently …

t-HNE: A Text-guided Hierarchical Noise Eliminator for Multimodal Sentiment Analysis

Z Li, L Li - Proceedings of the 31st International Conference on …, 2025 - aclanthology.org
Abstract In the Multimodal Sentiment Analysis task, most existing approaches focus on
extracting modality-consistent information from raw unimodal data and integrating it into …