- Academic Search

H Diao, Y Cui, X Li, Y Wang, H Lu, X Wang - arxiv preprint arxiv …, 2024 - arxiv.org

Existing vision-language models (VLMs) mostly rely on vision encoders to extract visual
features followed by large language models (LLMs) for visual-language tasks. However, the …

Spara Citera Citerat av 19 Relaterade artiklar Alla 4 versionerna Se som HTML-version

Deep metric learning in projected-hypersphere space

Y Xu, Z Chen, J Hu - Pattern Recognition, 2025 - Elsevier

Distance metric learning is a subfield of machine learning that aims to learn a discriminative
space in which samples of the same class are closer and samples of different classes are …

Spara Citera Relaterade artiklar Alla 2 versionerna

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

KARST: Multi-Kernel Kronecker Adaptation with Re-Scaling Transmission for Visual Classification

Y Zhu, H Diao, S Gao, L Chen, H Lu - arxiv preprint arxiv:2502.06779, 2025 - arxiv.org

Fine-tuning pre-trained vision models for specific tasks is a common practice in computer
vision. However, this process becomes more expensive as models grow larger. Recently …

Spara Citera Relaterade artiklar Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

t-HNE: A Text-guided Hierarchical Noise Eliminator for Multimodal Sentiment Analysis

Z Li, L Li - Proceedings of the 31st International Conference on …, 2025 - aclanthology.org

Abstract In the Multimodal Sentiment Analysis task, most existing approaches focus on
extracting modality-consistent information from raw unimodal data and integrating it into …

Spara Citera Relaterade artiklar Se som HTML-version

Skapa alarm

Citera

Avancerad sökning

Har sparats i Mitt bibliotek

Gssf: Generalized structural sparse function for deep cross-modal metric learning

Unveiling encoder-free vision-language models

Deep metric learning in projected-hypersphere space

KARST: Multi-Kernel Kronecker Adaptation with Re-Scaling Transmission for Visual Classification

t-HNE: A Text-guided Hierarchical Noise Eliminator for Multimodal Sentiment Analysis