Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Prefixkv: Adaptive prefix kv cache is what vision instruction-following models need for efficient generation
Recently, large vision-language models (LVLMs) have rapidly gained popularity for their
strong generation and reasoning capabilities given diverse multimodal inputs. However …
strong generation and reasoning capabilities given diverse multimodal inputs. However …
A Survey on Inference Optimization Techniques for Mixture of Experts Models
J Liu, P Tang, W Wang, Y Ren, X Hou, PA Heng… - arxiv preprint arxiv …, 2024 - arxiv.org
The emergence of large-scale Mixture of Experts (MoE) models has marked a significant
advancement in artificial intelligence, offering enhanced model capacity and computational …
advancement in artificial intelligence, offering enhanced model capacity and computational …
LLaVA-KD: A Framework of Distilling Multimodal Large Language Models
The success of Large Language Models (LLM) has led researchers to explore Multimodal
Large Language Models (MLLM) for unified visual and linguistic understanding. However …
Large Language Models (MLLM) for unified visual and linguistic understanding. However …
Advancing Fine-Grained Visual Understanding with Multi-Scale Alignment in Multi-Modal Models
Multi-modal large language models (MLLMs) have achieved remarkable success in fine-
grained visual understanding across a range of tasks. However, they often encounter …
grained visual understanding across a range of tasks. However, they often encounter …
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models
The recent surge in high-quality visual instruction tuning samples from closed-source vision-
language models (VLMs) such as GPT-4V has accelerated the release of open-source …
language models (VLMs) such as GPT-4V has accelerated the release of open-source …
MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders
Visual encoders are fundamental components in vision-language models (VLMs), each
showcasing unique strengths derived from various pre-trained visual foundation models. To …
showcasing unique strengths derived from various pre-trained visual foundation models. To …
Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Model
Vision-Language Models (VLMs) bring powerful understanding and reasoning capabilities
to multimodal tasks. Meanwhile, the great need for capable aritificial intelligence on mobile …
to multimodal tasks. Meanwhile, the great need for capable aritificial intelligence on mobile …
A Framework of Distilling Multimodal Large Language Models
The success of Large Language Models (LLM) has led researchers to explore Multimodal
Large Language Models (MLLM) for unified visual and linguistic understanding. However …
Large Language Models (MLLM) for unified visual and linguistic understanding. However …
[PDF][PDF] Learning to Inference Adaptively for Multimodal Large Language Models
Abstract Multimodal Large Language Models (MLLMs) have shown impressive capabilities
in reasoning, yet come with substantial computational cost, limiting their deployment in …
in reasoning, yet come with substantial computational cost, limiting their deployment in …
[PDF][PDF] Towards Better Adaptation of Foundation Models
Z Xu - pages.cs.wisc.edu
Foundation models have revolutionized artificial intelligence, yet fundamental challenges
remain in understanding and optimizing their capabilities in adaptation and inference. This …
remain in understanding and optimizing their capabilities in adaptation and inference. This …