Mmsft: Multilingual multimodal summarization by fine-tuning transformers

S Phani, A Abdul, MKS Prasad, HKD Sarma - IEEE Access, 2024 - ieeexplore.ieee.org
Multilingual multimodal (MM) summarization, involving the processing of multimodal input
(MI) data across multiple languages to generate corresponding multimodal summaries (MS) …

Multi-Scale Features Are Effective for Multi-Modal Classification: An Architecture Search Viewpoint

P Fu, X Liang, Y Qian, Q Guo, Y Zhang… - … on Circuits and …, 2024 - ieeexplore.ieee.org
Multi-modal neural architecture search (MNAS) is an effective approach to obtain task-
adaptive multi-modal classification models. Deep neural networks, as currently mainstream …

Tackling Real-world Complexity: Hierarchical Modeling and Dynamic Prompting for Multimodal Long Document Classification

T Liu, Y Hu, M Li, J Yi, X Chang… - IEEE Transactions on …, 2025 - ieeexplore.ieee.org
With the rapid growth of internet content, multimodal long document data has become
increasingly prominent, drawing significant attention from researchers. However, most …

Content and Relation Fuzzy Mitigation Framework for Intent Perception

W Zhang, H Qi, S Wang, Z Lin… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
In this paper, we tackle the content fuzzy and relation fuzzy in image-based intent
perception. Current research primarily focuses on additional modeling mechanisms and …

AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding

Y Wang, W Zhou, H Feng, H Li - arxiv preprint arxiv:2408.16986, 2024 - arxiv.org
Over the past few years, the advancement of Multimodal Large Language Models (MLLMs)
has captured the wide interest of researchers, leading to numerous innovations to enhance …