Beyond Token Compression: A Training-Free Reduction Framework for Efficient Visual Processing in MLLMs

H Li, J Zhang, W Liao, D Peng, K Ding, L ** - arxiv preprint arxiv …, 2025 - arxiv.org
Multimodal Large Language Models (MLLMs) are typically based on decoder-only or cross-
attention architectures. While decoder-only MLLMs outperform their cross-attention …

Knowledge Distillation in Mixture of Experts for Multi-Modal Medical LLMs

M Nathani, R Soni, R Mishra - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
Large Language Models (LLMs) have made significant strides in recent years, performing a
wide range of complex tasks across domains. However, in specialized fields like healthcare …