Mmsft: Multilingual multimodal summarization by fine-tuning transformers
S Phani, A Abdul, MKS Prasad, HKD Sarma - IEEE Access, 2024 - ieeexplore.ieee.org
Multilingual multimodal (MM) summarization, involving the processing of multimodal input
(MI) data across multiple languages to generate corresponding multimodal summaries (MS) …
(MI) data across multiple languages to generate corresponding multimodal summaries (MS) …
Multi-Scale Features Are Effective for Multi-Modal Classification: An Architecture Search Viewpoint
Multi-modal neural architecture search (MNAS) is an effective approach to obtain task-
adaptive multi-modal classification models. Deep neural networks, as currently mainstream …
adaptive multi-modal classification models. Deep neural networks, as currently mainstream …
Tackling Real-world Complexity: Hierarchical Modeling and Dynamic Prompting for Multimodal Long Document Classification
With the rapid growth of internet content, multimodal long document data has become
increasingly prominent, drawing significant attention from researchers. However, most …
increasingly prominent, drawing significant attention from researchers. However, most …
Content and Relation Fuzzy Mitigation Framework for Intent Perception
W Zhang, H Qi, S Wang, Z Lin… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
In this paper, we tackle the content fuzzy and relation fuzzy in image-based intent
perception. Current research primarily focuses on additional modeling mechanisms and …
perception. Current research primarily focuses on additional modeling mechanisms and …
AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding
Over the past few years, the advancement of Multimodal Large Language Models (MLLMs)
has captured the wide interest of researchers, leading to numerous innovations to enhance …
has captured the wide interest of researchers, leading to numerous innovations to enhance …