- Academic Search

Prefixkv: Adaptive prefix kv cache is what vision instruction-following models need for efficient generation

A Wang, H Chen, J Tan, K Zhang, X Cai, Z Lin… - arxiv preprint arxiv …, 2024 - arxiv.org

Recently, large vision-language models (LVLMs) have rapidly gained popularity for their
strong generation and reasoning capabilities given diverse multimodal inputs. However …

บันทึก อ้างอิง อ้างโดย2 บทความที่เกี่ยวข้อง ทั้งหมด 2 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A Survey on Inference Optimization Techniques for Mixture of Experts Models

J Liu, P Tang, W Wang, Y Ren, X Hou, PA Heng… - arxiv preprint arxiv …, 2024 - arxiv.org

The emergence of large-scale Mixture of Experts (MoE) models has marked a significant
advancement in artificial intelligence, offering enhanced model capacity and computational …

บันทึก อ้างอิง อ้างโดย1 บทความที่เกี่ยวข้อง ทั้งหมด 2 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

LLaVA-KD: A Framework of Distilling Multimodal Large Language Models

Y Cai, J Zhang, H He, X He, A Tong, Z Gan… - arxiv preprint arxiv …, 2024 - arxiv.org

The success of Large Language Models (LLM) has led researchers to explore Multimodal
Large Language Models (MLLM) for unified visual and linguistic understanding. However …

บันทึก อ้างอิง อ้างโดย1 บทความที่เกี่ยวข้อง ทั้งหมด 2 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Advancing Fine-Grained Visual Understanding with Multi-Scale Alignment in Multi-Modal Models

W Wang, Z Li, Q Xu, L Li, YQ Cai, B Jiang… - arxiv preprint arxiv …, 2024 - arxiv.org

Multi-modal large language models (MLLMs) have achieved remarkable success in fine-
grained visual understanding across a range of tasks. However, they often encounter …

บันทึก อ้างอิง อ้างโดย1 บทความที่เกี่ยวข้อง ทั้งหมด 2 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models

BK Lee, R Hachiuma, YCF Wang, YM Ro… - arxiv preprint arxiv …, 2024 - arxiv.org

The recent surge in high-quality visual instruction tuning samples from closed-source vision-
language models (VLMs) such as GPT-4V has accelerated the release of open-source …

บันทึก อ้างอิง บทความที่เกี่ยวข้อง ทั้งหมด 2 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders

J Cao, Y Zhang, T Huang, M Lu, Q Zhang, R An… - arxiv preprint arxiv …, 2025 - arxiv.org

Visual encoders are fundamental components in vision-language models (VLMs), each
showcasing unique strengths derived from various pre-trained visual foundation models. To …

บันทึก อ้างอิง บทความที่เกี่ยวข้อง ทั้งหมด 2 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Model

Q Feng, W Li, T Lin, X Chen - arxiv preprint arxiv:2412.01282, 2024 - arxiv.org

Vision-Language Models (VLMs) bring powerful understanding and reasoning capabilities
to multimodal tasks. Meanwhile, the great need for capable aritificial intelligence on mobile …

บันทึก อ้างอิง บทความที่เกี่ยวข้อง ทั้งหมด 2 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

A Framework of Distilling Multimodal Large Language Models

J Zhang, H He, X He, A Tong, Z Gan, C Wang, X Bai - openreview.net

The success of Large Language Models (LLM) has led researchers to explore Multimodal
Large Language Models (MLLM) for unified visual and linguistic understanding. However …

บันทึก อ้างอิง บทความที่เกี่ยวข้อง ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] wisc.edu

[PDF][PDF] Learning to Inference Adaptively for Multimodal Large Language Models

Z Xu, KD Nguyen, P Mukherjee, S Chaterji, S Bagchi… - pages.cs.wisc.edu

Abstract Multimodal Large Language Models (MLLMs) have shown impressive capabilities
in reasoning, yet come with substantial computational cost, limiting their deployment in …

บันทึก อ้างอิง บทความที่เกี่ยวข้อง ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] wisc.edu

[PDF][PDF] Towards Better Adaptation of Foundation Models

Z Xu - pages.cs.wisc.edu

Foundation models have revolutionized artificial intelligence, yet fundamental challenges
remain in understanding and optimizing their capabilities in adaptation and inference. This …

บันทึก อ้างอิง บทความที่เกี่ยวข้อง ทั้งหมด 2 ฉบับ ดูในรูปแบบ HTML

สร้างการแจ้งเตือน

อ้างอิง

การค้นหาขั้นสูง

บันทึกไปยังคลังของฉันแล้ว

Llava-mod: Making llava tiny via moe knowledge distillation

Prefixkv: Adaptive prefix kv cache is what vision instruction-following models need for efficient generation

A Survey on Inference Optimization Techniques for Mixture of Experts Models

LLaVA-KD: A Framework of Distilling Multimodal Large Language Models

Advancing Fine-Grained Visual Understanding with Multi-Scale Alignment in Multi-Modal Models

VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models

MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders

Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Model

A Framework of Distilling Multimodal Large Language Models

[PDF][PDF] Learning to Inference Adaptively for Multimodal Large Language Models

[PDF][PDF] Towards Better Adaptation of Foundation Models